Skip to main content
Physiology and Molecular Biology of Plants logoLink to Physiology and Molecular Biology of Plants
. 2020 Feb 5;26(3):433–444. doi: 10.1007/s12298-020-00771-9

A genome-wide comparative analysis of bZIP transcription factors in G. arboreum and G. raimondii (Diploid ancestors of present-day cotton)

Farrukh Azeem 1,#, Hira Tahir 1,#, Usman Ijaz 1, Tayyaba Shaheen 1,
PMCID: PMC7078431  PMID: 32205921

Abstract

Basic leucine zipper motif (bZIP) transcription factors (TFs) are involved in plant growth regulation, development, and environmental stress responses. These genes have been well characterized in model plants. In current study, a genome-wide analysis of bZIP genes was performed in Gossypium raimondii and Gossypium arboreum taking Arabidopsis thaliana as a reference genome. In total, 85 members of G. raimondii and 87 members of G. arboreum were identified and designated as GrbZIPs and GabZIPs respectively. Phylogenetic analysis clustered bZIP genes into 11 subgroups (A, B, C, D, F, G, H, I, S and X). Gene structure analysis to find the intro-exon structures revealed 1–14 exons in both species. The maximum number of introns were present in subgroup G and D while genes in subgroup S were intron-less except GrbZIP78, which is a unique characteristic as compared to other groups. Results of motif analysis predicted that all three species share a common bZIP motif. A detailed comparison of bZIPs gene distribution on chromosomes has shown a diverse arrangement of genes in both cotton species. Moreover, the functional similarity with orthologs was also predicted. The findings of this study revealed close similarity in gene structure of both cotton species and diversity in gene distribution on chromosomes. This study supports the divergence of both species from the common ancestor and later diversity in gene distribution on chromosomes due to evolutionary changes. Additionally, this work will facilitate the functional characterization of bZIP genes in cotton. Outcomes of this study represent foundation research on the bZIP TFs family in cotton and as a reference for other crops.

Electronic supplementary material

The online version of this article (10.1007/s12298-020-00771-9) contains supplementary material, which is available to authorized users.

Keywords: bZIPs, G. arboretum, G. raimondii, Genome-wide analysis

Introduction

G. raimondii (D5) and G. arboreum (A2) are considered as ancestors of present-day cotton (G. hirsutum), the most cultivated cotton species in the world (Li et al. 2015; Hinze et al. 2017). G. raimondii is a wild species (Shaheen et al. 2013a) while G. arboreum is still cultivated in limited areas in Pakistan, India and China (Shaheen et al. 2013b). Drought and heat stress are one of the major reasons for a global decline in cotton yield (USDA 2015; Ullah et al. 2017). The eminence of G. arboreum is attributed to numerous promising traits for cotton production, which are lacking in the upland cotton cultivars. These traits include tolerance to desiccation and disease resistance (Mehetre et al. 2003; Shaheen et al. 2013a). G. arboreum genome can be useful for mining drought and heat-responsive genes for the development of improved varieties.

Stress can activate regulatory and functional genes, which contribute to boosting the tolerance mechanism (Seki et al. 2003; Zhang et al. 2004). The regulatory genes include transcription factors, which have a significant regulatory role in numerous stress-inducible genes independently or collectively (Tran and Mochida 2010). A comprehensive genome-wide information is necessary to discern a transcription factor family in detail (Tran et al. 2007). The development of crops with greater yield under stress conditions could be accelerated through the information of stress-responsive transcription factors, their complex regulatory gene networks and functional characterization (Tran and Mochida 2010).

In plants, bZIP (basic leucine zipper) transcription factors belong to one of the largest gene families and play a vital role in regulating stress processes (Sornaraj et al. 2016). The involvement of the bZIP proteins in controlling various abiotic and biotic stress responses besides growth and developmental processes has been studied recently. The bZIP proteins play a significant role in pathogen defense and biological processes like embryogenesis, organ differentiation, seed maturation, stress signaling and vascular and flower development in many plants (Wei et al. 2012). bZIP proteins are involved in plant response to stresses like cold (Shimizu 2005; Liu and Stone 2011), drought (Yoshida et al. 2010; Wang et al. 2017), heat (Liu et al. 2013) and high salinity (Hsieh et al. 2010). These proteins are also involved in plant response to biotic stresses such as pathogen infection and hormonal signaling like abscisic acid (ABA) (Yoshida et al. 2010; Lopez-Molina et al. 2002; Choi et al. 2000), light (Zander et al. 2012) and ethylene (Li et al. 2015) signaling. The evolutionary history and expression pattern of the bZIP family in plants is still to be explored in plants (Wei et al. 2016).

After the availability of whole-genome sequences of many plant species, genome-wide analysis of important genes has become a powerful tool to analyze genomes for specific features (Hu et al. 2016). Current availability of whole-genome sequences made the genome-wide analysis possible of sister species G. arboreum and G. raimondii. Li et al. (2014) have shown that the G. raimondii genome exhibit close collinear relationship with the genome of G. arboreum after conducting whole-genome alignment.

In the current study, we performed a genome-wide analysis of bZIP transcription factors in two diploid species of cotton, G. raimondii and G. arboretum. This study is focused on genome-wide identification of bZIP transcription factors in cotton, phylogenetic relationships of these proteins, chromosomal location, gene structure analysis, motif analysis and GO analysis to predict gene functions. This study provides prospective knowledge for the evolutionary history and biological significance of the bZIP TF family in cotton.

Materials and methods

Identification and evolutionary analysis of bZIPs

The bZIPs sequences of A. thaliana, G. arboreum and G. raimondii were obtained from TAIR (https://www.arabidopsis.org/), NCBI (https://www.ncbi.nlm.nih.gov/), Plant transcription factor database (http://planttfdb.cbi.pku.edu.cn/), and Phytozome (https://phytozome.jgi.doe.gov). Protein sequences of bZIPs from Arabidopsis were used as a query against non-redundant protein sequence databases of G. arboreum and G. raimondii. BLASTP program (Jakoby et al. 2002) was used to check the predicted bZIPs in G. arboreum and G. raimondii. All the candidate protein sequences were examined for conserved bZIP domain (SM00338) using the SMART tool (http://smart.embl-heidelberg.de/) and pfam domain (PF00170) (http://pfam.xfam.org/) (Letunic et al. 2011) and sequences were filtered to obtain a final collection of non-redundant bZIP genes present in the two cotton species. Multiple sequence alignment was performed to confirm the conserved domains of the predicted G. arboreum and G. raimondii. The full-length bZIP proteins from A. thaliana, G. arboreum and G. raimondii were aligned in MEGA 7.0.21 and the maximum likelihood evolutionary tree was created based on the sequence alignments. The theoretical isoelectric point (pI) and molecular weight (MW) of the bZIP proteins from both species were calculated using the online tool ExPASy (http://expasy.org/tools/).

Gene structure analysis and detection of conserved motifs

MEME program (http://meme-suite.org/tools/meme) was used to identify the conserved motifs within full-length G. arboreum and G. raimondii bZIP proteins. The parameters used were, maximum number of motifs=20, distribution of motifs = any number of repetitions, optimum motif width=6 to 50 residues. The gene structure display server (GSDS) program (http://gsds.cbi.pku.edu.cn/) was used to illustrate the exon/intron organization in the gene structure of G. arboreum and G. raimondii bZIPs.

Mapping bZIP genes on cotton chromosomes

Precise positions on chromosomes for the genes encoding these GabZIP and GrbZIP proteins were determined from the GENE database of NCBI. The genes were plotted separately onto the 13 chromosomes according to their ascending order of physical position (bp), from the short-arm telomere to the long-arm telomere. Data available on NCBI (www.ncbi.nlm.nih.gov) was used to get all the starting locations of 87 members in G. arboreum and 85 members in G. raimondii. The MapChart 2.3 (Voorrips et al. 2012) software was used to visualize the distribution of bZIP genes on the chromosomes of G. arboreum and 13 G. raimondii.

GO analysis of ortholog gene pairs in G. arboreum, G. raimondii and A. thaliana

The gene ontology annotations of bZIP genes for G. raimondii, G. arboreum and A. thaliana were retrieved from Blast2GO.

Results

Identification of bZIP TFs in Gossypium arboreum and Gossypium raimondii

The bZIP protein sequences of A. thaliana (Reference species) were retrieved from TAIR (https://www.arabidopsis.org/) and plant transcription factor database PlantTFDB; (http://planttfdb.cbi.pku.edu.cn/). The retrieved AtbZIPs (bZIPs of A. thaliana), sequences were used individually in NCBI-BLAST search against G. arboreum and G. raimondii sequences. Each sequence was used as a query in separate BLAST searches to make results more precise and to identify the most possible matches. After making all queries, redundant sequences were removed. After screening, 87 members of the bZIP gene family from G. arboreum and 85 from G. raimondii were identified. The presence of signature bZIP domain in these sequences was confirmed using an online tool (SMART http://smart.embl-heidelberg.de) and pfam (http://pfam.xfam.org/), which exhibited the presence of the conserved bZIP domain in all members (Supplementary Table S1–4 and 6).

The gene location on the chromosomes from top to bottom (1–13 in G. raimondii and 1–13 in G. arboreum) was used to propose the nomenclature. Sequences with unconfirmed chromosomal regions were represented as scaffolds. The 87 protein sequences of G. arboreum varied from 136 (GabZIP1) to 689 (GabZIP80) and 85 full length predicted G. raimondii protein sequences varied from 137 (GrbZIP3) to 688 (GrbZIP79) amino acid residues (Supplementary Table S1 and S2). The isoelectric points (pI) of bZIP proteins ranged from 4.88 to 10.38 and 4.58 to 10.67 in G. arboreum and G. raimondii respectively. The analysis also revealed that the molecular weight of the polypeptides ranged from 15.57 to 59.95 kDa in G. arboreum and 16.16 to 75.08 kDa in G. raimondii.

Phylogenetic analysis

The characterization of the evolutionary relationship between bZIPs from A. thaliana, G. arboreum, and G. raimondii was performed by the alignment of bZIPs from all three species using MEGA 7.0.21 to generate rooted tree by the maximum likelihood (ML) method (Fig. 1). The bZIPs from G. raimondii (GrbZIPs) and G. arboreum (GabZIPs) were assigned to 11 subfamilies (A, B, C, D, E, F, G, H, I, S, and X) with their orthologues from A. thaliana (Fig. 1). The subfamilies are based on homology of the basic region and additional conserved motifs as proposed by Jakoby et al. (2002) and Dröge-Laser et al. (2018). The group X (Xa, Xb, Xc) represents the uncharacterized members. The subfamily specific motifs have been summarized in table S6. The largest group S contains 57 members i.e., 22 GabZIPs, 20 AtbZIPs and 19 GrbZIPs. The smallest group H comprises of only two members; AtbZIP56 and GrbZIP49. Members from all three species were present in all groups. The subsequent analyses of gene structures and conserved motifs analysis reinforced the classification of groups by the group-specific sequence characteristics. Moreover, a comparison of the number of bZIP genes and the number of members in a group among three species was also conducted (Table 1). According to the phylogenetic results, four GrbZIP (GrbZIP3, 4, 5, 6) and four GabZIP (GabZIP9, 14, 20, 21) proteins containing bZIP and bZIP_C domains were clustered in group C. Meanwhile, thirteen GrbZIP and twelve GabZIP proteins sharing bZIP and DOG domains fell into group D, and five GrbZIP and six GabZIP proteins containing bZIP and MFMR domains were clustered together in group G (Fig. 1). The leucine zipper (bZIP) TFs are characterized by a basic DNA- binding region and an adjacent so-called leucine zipper, enabling bZIP dimerization (Dröge-Laser et al. 2018). The basic two conserved regions of bZIP proteins (Fig. 2) were present in all sequences except GrbZIP22, GrbZIP54, GrbZIP82, GabZIP22, GabZIP34, GabZIP72, and GabZIP77. The SMART analysis showed that most of the bZIP proteins in both species contained only one bZIP domain, but there were 17 GrbZIP and 18 GabZIP proteins possessing additional domains, such as a multifunctional mosaic region (MFMR) and DELAY OF GERMINATION (DOG) (Table S4).

Fig. 1.

Fig. 1

Phylogenetic tree of bZIPs in Gossypium arboreum, Gossypium raimondii and Arabidopsis thaliana. A, B, C, D, E, F, G, H, I, Xa, Xb, and Xc is showing the phylogenetic cluster of genes. The AtbZIPs, GabZIPs and GrbZIPS have been represented by green, red and blue taxa, respectively

Table 1.

Number of bZIP transcription factors in different species and the number of genes in groups

Species name Total bZIPs Group A Group B Group C Group D Group E Group F Group G Group H Group I Group S/J/X References
Gosypium raimondii 85 14 1 6 12 5 4 5 1 10 21 Reported in this study
Gosypium arboreum 87 17 1 6 12 5 2 6 0 10 23 Reported in this study
Arabidopsis thaliana 78 13 3 4 10 6 3 5 2 12 20 Hu et al. (2016)
Cucumis sativus (cucumber) 64 13 4 3 10 14 5 14 1 Baloglu et al. (2014)
Oryza sativa (rice) 89 17 3 6 16 6 3 8 3 11 16 Hu et al. (2016)
Vitis (grapevine) 55 13 3 10 5 4 6 2 1 2 7 Liu et al. (2014a, b)
Glycine max (soybean) 138 38 2 12 74 7 3 40 6 21 28 Wang et al (2015)
Brachypodium distachyon 96 2 20 19 7 13 3 6 9 17 Liu and Chu (2015)
Manihot esculenta 77 15 2 4 13 11 2 6 1 8 15 Hu et al. (2016)
Populus trichocarpa (poplar) 99 18 5 19 7 8 9 3 2 4 14 Liu et al. (2014a, b)

Fig. 2.

Fig. 2

Conserved bZIP domains a G. arboreum bG. raimondii. The logo represents the basic region (motif 2–19) and the leucine zipper region (motif 20–34). For each logo, multiple sequence alignment of respective protein sequences were subjected to an online tool “ WebLogo” (Crooks 2004)

Gene Structure analysis of bZIP genes

The positions of introns/exons are commonly well-conserved in orthologous genes in evolutionary times, while the structure of introns and exons is relatively less conserved in paralogous genes. To investigate the structural diversity of bZIP genes, we analyzed the arrangements of introns/exons by comparing genomic and CDS sequences. Overall, there was considerable diversity in the number of exons (1–14) and the length of exons in both cotton species. However, at the subfamily level, members exhibited similar gene structures in terms of intron number, exon length and/or intron phases except for group X. Interestingly, the members of group S are intron-less in all the three plants under study (Fig. 3).

Fig. 3.

Fig. 3

Exon-intron structures of bZIP genes in aG. arboreum and bG. raimondii. The arrangement of exon-intron structures was examined using the gene structure display server (GSDS). The introns and exons are represented by black lines and yellow boxes, respectively. The letters A, B, C, D, E, F, G, H, I, S, and X represent subfamilies

Conserved domain analysis

Identification of the conserved motifs of proteins could help to elucidate the protein functions, and plant bZIP proteins usually possess additional conserved motifs that might be involved in activating the functions of bZIP proteins (Yang et al. 2019). The sequences of bZIP domain were studied in protein sequences of the GrbZIPs and GabZIPs using “MEME” to examine the structural diversity and the functional prediction of bZIPs (Fig. 4). As a result, 10 conserved motifs were identified (Fig. 4). Amongst them, motifs 1, 4, and 7 were annotated as the bZIP domain. The motif 1 was present in all the sequences, the motif 4 was present in group I and E, while motif 7 was present in group C, G, F, and S. The bZIPs in group D, which contained the bZIP and DOG domains, possessed five conserved motifs (motifs 1, 6, 2, 5, and 3). The members of group A have motif 9, 8, 10 and 1.

Fig. 4.

Fig. 4

Conserved motifs of bZIPs in G. arboreum and G. raimondii. The AtbZIPs, GabZIPs and GrbZIPS have been represented by green, red and blue taxa, respectively. The letters A, B, C, D, E, F, G, H, I, S, and X represent subfamilies. The numbers 1–10 represent different motifs

The subcellular localization signal was also investigated using an online tool “WoLF PSORT”. It was observed that all the sequences possessed nuclear localization signal except GrbZIP22, 23, 79 and GabZIP80. There were 50 GabZIPs and 54 GrbZIPs that possessed nuclear localization signal exclusively. However, other protein sequences contained localization signals for endoplasmic reticulum, plasma membrane, chloroplast, vacule, plastids, cytoplasm and mitochondria (Supplementary file X).

Chromosomal locations and gene duplication

In total, 87 and 85 members of bZIP family were identified in the genome of G. arboreum and G. raimondii respectively. Distribution of all these genes was found on all the 13 chromosomes. In G. arboreum, three genes could not conclusively be mapped to any chromosome and therefore named as GabZIP85 to GabZIP87 respectively. Distribution of bZIP genes was scattered throughout the 13 chromosomes of G. arboreum and G. raimondii. Chromosome 10 had the largest number of bZIP genes (11 genes) in G. arboreum, while chromosome 1, 2, 7 and 9 had four genes. Similarly, chromosome 5 had the largest number of bZIPs (11 genes) in G. raimondii, while chromosome 3 had the smallest number of bZIP genes (2 genes) (Fig. 5). To further examine the evolution of bZIP genes in both species of cotton, genome duplication events were investigated for segmental and tandem duplications (Table S5). The gene duplication plays a major role in the expansion of gene families in plant evolution (Cannon et al. 2004; Zhou et al. 2018). In G. arboreum, 12 pairs of tandem duplication (represented by red font in Fig. 5a) were identified on chromosomes 1, 2, 3, 4, 5, 10, 11, and 12 (Fig. 5a). In addition, 20 GabZIP genes located on the duplicated segmental regions of chromosomes made up to 10 segmental duplication events (Fig. 5a). Similarly, in G. raimondii, 10 pairs of tandem duplication (represented by red font in Fig. 5b) were identified on chromosomes 1, 2, 5, 6, 7, 8, 9, and 12 (Fig. 5b). In addition, 12 GrbZIP genes located on the duplicated segmental regions of chromosomes made up to five segmental duplication events (Fig. 5b).

Fig. 5.

Fig. 5

Chromosomal mapping of bZIP genes in aG. arboretum and bG. raimondii both species. The vertical columns represent chromosomes with the gene names shown on the right. Genes located on the duplicated segmental regions have been highlighted by colors. The segmentally duplicated genes are colored red. Tandemly duplicated genes are represented by multiple colors, where the same color represents a duplication pair

Gene Ontology analysis

GO analysis was performed to predict the functions of bZIP proteins. The bZIPs from both species were potentially involved in several important biological processes in common such as positive regulation of metabolic processes, regulation of the cellular process, organic substance metabolic process, primary metabolic process, regulation of the metabolic process, biosynthetic process, cellular metabolic process, nitrogen compound metabolic process. In addition, all the molecular functions were found in all three species such as protein binding, heterocyclic compound binding, transcription factor activity sequence-specific DNA binding and organic cyclic compound binding. It predicts the similarity in the functions of bZIP genes in all three species. Through Blast2GO cellular component analysis, it was analyzed that bZIP genes of G. arboreum, G. raimondii and A. thaliana were integrated into intrinsic components of membrane, intracellular organelle, membrane-bound organelle, intercellular and intracellular part (Fig. 6).

Fig. 6.

Fig. 6

Pie charts of biological processes, cellular components and molecular functions of aG. arboreum and bG. raimondii

Discussion

Cotton is an eminent natural fiber source. The G. arboreum (A2) and G. raimondii (D5) are considered as parental genomes of the present-day allotetraploid cotton genome G. hirsutum (AD) (Cronn et al. 2002; Wendel and Cronn 2003). The G. arboreum (Desi cotton), native to Indo-Pak, is resistant to abiotic stresses and pathogen infestation (Iqbal et al. 2015; Shaheen et al. 2013a). Both the G. arboreum and G. raimondii genomes are recently sequenced and are yet not well scrutinized. The major growth-limiting factor in cotton production is the inadequate availability of water (Ullah et al. 2017). The bZIP transcription factors play a vital role in the ABA signaling pathway, which is crucial in the plant responses to abiotic stresses (Yamaguchi-Shinozaki and Shinozaki 2006). The involvement of bZIP transcription factors in abiotic stress responses in plants has been repeatedly investigated (Hu et al. 2016). However, there are very few reports in cotton.

The present study reports a genome-wide analysis of bZIPs in G. arboreum and G. raimondii, using the bZIP genes of A. thaliana as a query sequence. In addition to being a model genome, A. thaliana has also proven to be a close relative of cotton (Malvales) out of brassicales (Rong et al. 2005). In the present study, 85 bZIP members were identified from G. raimondii and 87 members of bZIPs from G. arboreum. In prior studies, Jakoby et al (2002) identified 75 bZIP members in Arabisopsis, Nijhawan et al (2007) identified 89 bZIPs in rice, Baloglu et al (2014) identified 64 in cucumber, Liu et al (2014a, b) identified 55 in grapevine, Liu and Chu (2015) found 96 bZIP members in Brachypodium distachyon, Wang et al (2015) identified 138 in Glycine max, 72 in Phaseolus vulgaris, 61 in Cajanus cajan, 65 in Medicago truncatula, 33 in Lotus japonicas and 59 in Cicer arietinum and Hu et al (2016) found 77 bZIP members in cassava. The retrieved data showed that bZIPs in G. arboreum and G. raimondii had expanded as compared to Arabidopsis, Phaseolus vulgaris, Lotus japonicus, grapevine, Cassava, Medicago truncatula, Cajanus cajan, Cucumber and Cicer arietinum while it had contracted as compared to rice, sorghum, maize, Brachypodium distachyon and Glycine max.

The evolutionary analysis clustered the bZIPs into 11 subfamilies as A, B, C, D, F, G, H, I, S and X (Fig. 1). Similar groups were observed in previous reports in the bZIP family in Arabidopsis, Sorghum, Cassava, Lotus japonicus, and Grapevine (Jakoby et al. 2002; Hu et al. 2016; Liu and Chu 2015; Liu et al. 2014a, b). Each group included bZIP genes from all the three species, predicting a similar evolutionary trajectory of these genes in Gossypium and Arabidopsis. In Arabidopsis, the basic region contains a characteristic motif (N-X7-R/K-X9) responsible for DNA-binding and nuclear localization (Dröge-Laser et al. 2018), while the leucine zipper forms an amphipathic surface that mediates specific recognition and dimerization (Li et al. 2016; Hu et al. 2016). Although, this motif is also conserved in both Gossypium species, there is another R/K motif in basic region resulting in a novel motif N-X9-R/K-X7 (Fig. 2). Such variations may alter the DNA binding specificity and subcellular localization of these proteins. This prediction is in compliance with variable subcellular localization signals in bZIP proteins (Table data X). The bZIP proteins in plants generally possess supplementary conserved motifs, which are potentially important for functional activation (Jin et al. 2014). In addition to bZIP domain, there are two extra domains (DOG and MFMR) in members of D and G groups (Table S4), and some groups possessed specific sequence motifs corresponding to different protein domains (Fig. 3). These findings imply that different motifs outside the bZIP domain region might play different roles in determining the functions of bZIP proteins (Jin et al. 2014; Wang et al. 2018).

Gene structure analysis revealed that the number of exons varies from 0 to 14 in both GabZIPs and GrbZIPs. In Arabidopsis, the number of exons varied from 1 to 13 (Nuruzzaman et al. 2010). In both G. arboreum and G. raimondii, mostly genes were found closely related to each other because of the similarity in their intron-exon structures which depict their close phylogenetic relationship. The hypothesis of sharing a common ancestor is also validated by this close similarity in the intron-exon structure (Cronn et al. 2002; Wendel and Cronn, 2003). It was also observed in both species that the intron number was high in G and D subfamilies as compared to A, B, C, F, H, S, I and X. In rice, Nuruzzaman et al. (2010) reported a higher frequency of intron loss than the frequency of intron gain after segmental duplication. Consequently, it was concluded that G and D subfamilies might have the original genes, from which other clusters were derived. Similar findings in Cassava were reported by Hu et al (2016). Furthermore, all the members in group S were intronless in both GrbZIPs and GabZIPs except GrbZIP78. Other species i.e Cassava, grapevine and Brachypodium distachyon also exhibited the same feature (Liu et al. 2014a, b; Liu and Chu 2015; Hu et al. 2016). It could reduce the posttranscriptional processes for a quick response to environmental stimuli (Zhou et al. 2018). In Arabidopsis, the number of exons is relatively high in groups C and D and the lowest number of exons were present in group S. There was only one exon present in group S in Arabidopsis that is also in accordance with findings of the present study (Jakoby et al. 2002).

Chromosomal location map exhibited a dispersed distribution of all the selected bZIP members across the 13 chromosomes of G. arboreum and G. raimondii (Fig. 5) which was also evident in previous studies, in Grapevine (Liu et al. 2014a, b), cucumber (Baloglu et al. 2014) and Brachypodium distachyon (Liu and Chu, 2015). Distribution of genes indicates similar gene structure diversity of bZIPs in different species. However, some members of the bZIP family from G. arboreum (GabZIP83 to GabZIP86) could not be placed at any position due to their unplaced scaffolds. In G. arboreum, the largest number of genes (11) were harbored at chromosome number 10 among the 13 chromosomes of this species, whereas in G. raimondii, a similar number of genes (11) were located on chromosome number 5. Chromosome number 1, 2, 7 and 9 had the same number of genes i.e., 4 in G. arboreum; however, in G. raimondii, the smallest number of genes (only 2) were present on chromosome number 3. Subsequently, the uneven distribution of genes is evidence of genetic variations resulting as an evolutionary process of these two species after divergence from each other (Wendel and Cronn, 2003).

In both species of cotton, despite similar gene structure, diversity in gene distribution was observed. This diversity can be attributed to the findings of Li et al (2015) about the evolution of the cotton genome. It was concluded that G. raimondii evolved from the common eudicot ancestor before G. arboreum and enhanced activity of transposable elements was observed in the D genome as compared to A genome (Li et al. 2015).

Through GO analysis, the comparative analysis of functions in these three species was done and after comparing this data with a phylogenetic tree, it can be concluded that some bZIP genes maintained their functions after divergence.

Conclusion

This study revealed a high similarity in exon-intron structure of the bZIP gene family in both G. arboreum and G. raimondii, however, a diverse distribution of genes across chromosomes was observed. This study supports the close phylogenetic relationship between G. arboreum and G. raimondii as well as their relationship with A. thaliana. The identified genes can be a rich informational resource to manipulate the cotton genome to develop improved cotton varieties.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Farrukh Azeem and Hira Tahir have contributed equally to this work.

References

  1. Baloglu MC, Eldem V, Hajyzadeh M, Unver T. Genome-wide analysis of the bZIP transcription factors in cucumber. PLoS ONE. 2014;9:e96014. doi: 10.1371/journal.pone.0096014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Cannon SB, Mitra A, Baumgarten A, et al. The roles of segmental and tandem gene duplication in the evolution of large gene families in Arabidopsis thaliana. BMC Plant Biol. 2004;4:10. doi: 10.1186/1471-2229-4-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Choi H-I, Hong J-H, Ha J-O, et al. ABFs, a family of ABA-responsive element binding factors. J Biol Chem. 2000;275:1723–1730. doi: 10.1074/jbc.275.3.1723. [DOI] [PubMed] [Google Scholar]
  4. Cronn RC, Small RL, Haselkorn T, Wendel JF. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. Am J Bot. 2002;89:707–725. doi: 10.3732/ajb.89.4.707. [DOI] [PubMed] [Google Scholar]
  5. Crooks GE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Dröge-Laser W, Snoek BL, Snel B, Weiste C. The Arabidopsis bZIP transcription factor family—an update. Curr Opin Plant Biol. 2018;45:36–49. doi: 10.1016/j.pbi.2018.05.001. [DOI] [PubMed] [Google Scholar]
  7. Hinze LL, Hulse-Kemp AM, Wilson IW, et al. Diversity analysis of cotton (Gossypium hirsutum L.) germplasm using the CottonSNP63K Array. BMC Plant Biol. 2017;17:37. doi: 10.1186/s12870-017-0981-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Hsieh TH, Li CW, Su RC, et al. A tomato bZIP transcription factor, SlAREB, is involved in water deficit and salt stress response. Planta. 2010;231:1459–1473. doi: 10.1007/s00425-010-1147-4. [DOI] [PubMed] [Google Scholar]
  9. Hu W, Yang H, Yan Y, et al. Genome-wide characterization and analysis of bZIP transcription factor gene family related to abiotic stress in cassava. Sci Rep. 2016;6:22783. doi: 10.1038/srep22783. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Iqbal MA, Abbas A, Zafar Y, Mehboob-ur-Rahman Characterization of indigenous gossypium Arboreum L. Genotypes for various fiber quality traits. Pakistan J Bot. 2015;47:2347–2354. [Google Scholar]
  11. Jakoby M, Weisshaar B, Dröge-Laser W, et al. bZIP transcription factors in Arabidopsis. Trends Plant Sci. 2002;7:106–111. doi: 10.1016/S1360-1385(01)02223-3. [DOI] [PubMed] [Google Scholar]
  12. Jin Z, Xu W, Liu A. Genomic surveys and expression analysis of bZIP gene family in castor bean (Ricinus communis L.) Planta. 2014;239:299–312. doi: 10.1007/s00425-013-1979-9. [DOI] [PubMed] [Google Scholar]
  13. Letunic I, Doerks T, Bork P. SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2011;40(D1):D302–D305. doi: 10.1093/nar/gkr931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Li X, Chu Z, Wang S. Genome-wide evolutionary characterization and analysis of bZIP transcription factors and their expression profiles in response to multiple abiotic stresses in Wheat. BMC Genom. 2014;16(1):227–238. doi: 10.1186/s12864-015-1457-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Li F, Fan G, Lu C, et al. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol. 2015;33:524–530. doi: 10.1038/nbt.3208. [DOI] [PubMed] [Google Scholar]
  16. Li Y-Y, Meng D, Li M, Cheng L. Genome-wide identification and expression analysis of the bZIP gene family in apple (Malus domestica) Tree Genet Genomes. 2016;12:82. doi: 10.1007/s11295-016-1043-6. [DOI] [Google Scholar]
  17. Liu X, Chu Z. Genome-wide evolutionary characterization and analysis of bZIP transcription factors and their expression profiles in response to multiple abiotic stresses in Brachypodium distachyon. BMC Genom. 2015;16:227. doi: 10.1186/s12864-015-1457-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Liu H, Stone SL. E3 ubiquitin ligases and abscisic acid signaling. Plant Signal Behav. 2011;6:344–348. doi: 10.4161/psb.6.3.13914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Liu Z, Kong L, Zhang M, et al. Genome-wide identification, phylogeny, evolution and expression patterns of AP2/ERF genes and cytokinin response factors in Brassica rapa ssp. pekinensis. PLoS One. 2013 doi: 10.1371/journal.pone.0083444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Liu G-T, Ma L, Duan W, et al. Differential proteomic analysis of grapevine leaves by iTRAQ reveals responses to heat stress and subsequent recovery. BMC Plant Biol. 2014;14:110. doi: 10.1186/1471-2229-14-110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Liu J, Chen N, Chen F, et al. Genome-wide analysis and expression profile of the bZIP transcription factor gene family in grapevine (Vitis vinifera) BMC Genom. 2014;15:281. doi: 10.1186/1471-2164-15-281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Lopez-Molina L, Mongrand S, McLachlin DT, et al. ABI5 acts downstream of ABI3 to execute an ABA-dependent growth arrest during germination. Plant J. 2002;32:317–328. doi: 10.1046/j.1365-313X.2002.01430.x. [DOI] [PubMed] [Google Scholar]
  23. Mehetre SS, Aher AR, Gawande VL, et al. Induced polyploidy in Gossypium: a tool to overcome interspecific incompatibility of cultivated tetraploid and diploid cottons. Curr Sci. 2003;84:1510–1512. [Google Scholar]
  24. Nijhawan A, Jain M, Tyagi AK, Khurana JP. genomic survey and gene expression analysis of the basic leucine zipper transcription factor family in rice. Plant Physiol. 2007;146:333–350. doi: 10.1104/pp.107.112821. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Nuruzzaman M, Manimekalai R, Sharoni AM, et al. Genome-wide analysis of NAC transcription factor family in rice. Gene. 2010;465:30–44. doi: 10.1016/j.gene.2010.06.008. [DOI] [PubMed] [Google Scholar]
  26. Rong J, Bowers JE, Schulze SR, et al. Comparative genomics of Gossypium and Arabidopsis: unraveling the consequences of both ancient and recent polyploidy. Genome Res. 2005;15:1198–1210. doi: 10.1101/gr.3907305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Seki M, Kamei A, Yamaguchi-Shinozaki K, Shinozaki K. Molecular responses to drought, salinity and frost: common and different paths for plant protection. Curr Opin Biotechnol. 2003;14:194–199. doi: 10.1016/S0958-1669(03)00030-2. [DOI] [PubMed] [Google Scholar]
  28. Shaheen T, Zafar Y, Rahman M. QTL mapping of some productivity and fibre traits in Gossypium arboreum. Turk J Botany. 2013;37:802–810. doi: 10.3906/bot-1209-47. [DOI] [Google Scholar]
  29. Shaheen T, Zafar Y, Stewart JM, Rahman M. Development of short gSSRs in G. arboreum and their utilization in phylogenetic studies. Turkish J Agric For. 2013;37:288–299. doi: 10.3906/tar-1202-35. [DOI] [Google Scholar]
  30. Shimizu H. LIP19, a basic region leucine zipper protein, is a Fos-like molecular switch in the cold signaling of rice plants. Plant Cell Physiol. 2005;46:1623–1634. doi: 10.1093/pcp/pci178. [DOI] [PubMed] [Google Scholar]
  31. Sornaraj P, Luang S, Lopato S, Hrmova M. Basic leucine zipper (bZIP) transcription factors involved in abiotic stresses: a molecular model of a wheat bZIP factor and implications of its structure in function. Biochim Biophys Acta Gen Subj. 2016;1860:46–56. doi: 10.1016/j.bbagen.2015.10.014. [DOI] [PubMed] [Google Scholar]
  32. Tran LP, Mochida K (2010) Identification and prediction of abiotic stress responsive transcription factors involved in abiotic stress signaling in soybean. pp 255–257. 10.1093/dnares/dsp023.biotic [DOI] [PMC free article] [PubMed]
  33. Tran LP, Nakashima K, Shinozaki K, et al. Plant gene networks in osmotic stress response: from genes to regulatory networks. Methods Enzymol. 2007;428:109–128. doi: 10.1016/S0076-6879(07)28006-1. [DOI] [PubMed] [Google Scholar]
  34. Ullah A, Sun H, Yang X, Zhang X. Drought coping strategies in cotton: increased crop per drop. Plant Biotechnol J. 2017;15:271–284. doi: 10.1111/pbi.12688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Voorrips RE, Bink MCAM, van de Weg WE. Pedimap: software for the visualization of genetic and phenotypic data in pedigrees. J Hered. 2012;103(6):903–907. doi: 10.1093/jhered/ess060. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Wang Z, Cheng K, Wan L, et al. Genome-wide analysis of the basic leucine zipper (bZIP) transcription factor gene family in six legume genomes. BMC Genom. 2015;16:1053. doi: 10.1186/s12864-015-2258-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Wang Y, Xu D, Jia L, Huang X, Ma G, Xu X. Genome-wide identification and structural analysis of bZIP transcription factor genes in Brassica napus. Genes. 2017;8(10):288–301. doi: 10.3390/genes8100288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Wang Y, Zhang Y, Zhou R, et al. Identification and characterization of the bZIP transcription factor family and its expression in response to abiotic stresses in sesame. PLoS ONE. 2018;13:e0200850. doi: 10.1371/journal.pone.0200850. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Wei A, Tan L, Zhang H. The role of bZIP transcription factors in green plant evolution: adaptive features emerging from four founder genes. PloS One. 2016;3(8):126–139. doi: 10.1371/journal.pone.0002944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Wei K, Chen J, Wang Y, et al. Genome-wide analysis of bZIP-encoding genes in maize. DNA Res. 2012;19:463–476. doi: 10.1093/dnares/dss026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Wendel JF, Cronn RC (2003) Polyploidy and the evolution history of cotton. Adv Agron
  42. Yamaguchi-Shinozaki K, Shinozaki K. Transcriptional regulatory networks in cellular responses and tolerance to dehydration and cold stresses. Annu Rev Plant Biol. 2006;57:781–803. doi: 10.1146/annurev.arplant.57.032905.105444. [DOI] [PubMed] [Google Scholar]
  43. Yang Y, Li J, Li H, et al. The bZIP gene family in watermelon: genome-wide identification and expression analysis under cold stress and root-knot nematode infection. PeerJ. 2019;7:e7878. doi: 10.7717/peerj.7878. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Yoshida T, Fujita Y, Sayama H, et al. AREB1, AREB2, and ABF3 are master transcription factors that cooperatively regulate ABRE-dependent ABA signaling involved in drought stress tolerance and require ABA for full activation. Plant J. 2010;61:672–685. doi: 10.1111/j.1365-313X.2009.04092.x. [DOI] [PubMed] [Google Scholar]
  45. Zander M, Chen S, Imkampe J, et al. Repression of the Arabidopsis thaliana jasmonic acid/ethylene-induced defense pathway by TGA-interacting glutaredoxins depends on their C-terminal ALWL Motif. Mol Plant. 2012;5:831–840. doi: 10.1093/mp/ssr113. [DOI] [PubMed] [Google Scholar]
  46. Zhang JZ, Creelman RA, Zhu J-L. From laboratory to field. Using information from Arabidopsis to engineer salt, cold, and drought tolerance in crops. Plant Physiol. 2004;135:615–621. doi: 10.1104/pp.104.040295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Zhou Y, Zeng L, Chen R, et al. Genome-wide identification and characterization of stress-associated protein (SAP) gene family encoding A20/AN1 zinc-finger proteins in Medicago truncatula. Arch Biol Sci. 2018;70:87–98. doi: 10.2298/ABS170529028Z. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials


Articles from Physiology and Molecular Biology of Plants are provided here courtesy of Springer

RESOURCES