Abstract
WRKY transcription factors, a plant-specific family of transcriptional regulators, are classified into four groups (I–IV) and play pivotal roles in plant defense, development, and stress responses. These proteins are characterized by conserved WRKY domains that preferentially bind to the W-box cis-element C/TTGACC/T in target gene promoters. In Gossypium hirsutum (Gh; upland cotton), the group IId member GhWRKY17 regulates cotton fiber development by activating downstream target genes such as GhHOX3 through promoter W-box binding. However, the structural basis for its DNA recognition specificity remains elusive. Here, we present the 1.8 Å resolution crystal structure of the GhWRKY17 WRKY domain in complex with the GhHOX3 promoter DNA—the first structural characterization of a group IId WRKY protein. Structural analysis reveals that it consists of four antiparallel β-strands, with the β2-strand (harboring the conserved 249WRKYGQK255 motif) and β3-strand co-operatively engaging the DNA major groove. Key residues (R250, K251, Y252, Q254, K255, R264, Y266, Y267) form an intricate hydrogen-bonding network essential for recognizing the extended G/TTTGACC motif. Comparative structural analyses with group I/IIa/III WRKY–DNA complexes reveal that GhWRKY17’s dual-strand engagement and extensively hydrogen bond-mediated specific interaction represent novel mechanistic features distinguishing group IId members from other WRKY subgroups, emphasizing the necessity for subgroup-specific investigations. These findings not only establish a structural paradigm for group IId WRKY function but also provide molecular insights for engineering cotton fiber traits through transcriptional regulation.
Keywords: GhWRKY17 WRKY domian, HOX3 W-box dsDNA, complex structure
Introduction
Transcription factors are crucial regulators of plant growth and development [1–3]. The expression of many genes in plants, as well as their responses to specific stimuli, depends on the interaction between transcription factors and corresponding cis-acting elements [4–6]. This interaction can activate or inhibit gene transcription, thereby co-ordinating cross-talk between signaling pathways [2,5,7,8]. Among these regulators, WRKY proteins represent one of the largest plant-specific families, orchestrating diverse processes from stress responses to organ morphogenesis [7,8]. The WRKY transcription factors have been extensively studied in various plants, including Arabidopsis thaliana (At; mouse-ear cress) [9], Oryza sativa (Os; rice) [10], Gossypium hirsutum (Gh; upland cotton) [11], and Glycine max (Gm; soybean) [12], where they function distinctly in different plant tissues and developmental stages [7,8].
WRKY transcription factors are defined by the presence of one or more WRKY domains, a DNA-binding domain composed of ∼60 amino acid residues that preferentially binds to the DNA sequence C/TTGACC/T, termed W-box, with a universally conserved core TGAC sequence [8,13,14]. These domains are characterized by a conserved WRKYGQK motif at the N-terminus and a zinc-finger-like motif at the C-terminus, with two types: a zinc‑finger motif with two cysteines and two histidines (C2-H2; C-X4-5-C-X22-23-H-X-H) or C2-HC (C-X7-C-X23-H-X-C) [13,15]. Based on the number of WRKY domains and the type of zinc-finger-like motifs, WRKY transcription factors are divided into four groups (I–IV) [14,15]. Group I: dual WRKY domains (N- and C-terminal) with C2-H2 zinc-finger-like motifs. While the C-terminal WRKY domain (C-WRKY) mediates primary DNA binding, the N-terminal WRKY domain (N-WRKY) enhances affinity through co-operative interactions [16,17]. However, recent structural studies revealed unexpected DNA-binding capacity in AtWRKY1’s N-WRKY domain [18]. Group II: single WRKY domain with C2-H2 zinc-finger-like motif [8,14,15], subdivided into six subgroups (IIa–IIf) by sequence phylogeny [8,19]. Group III: single WRKY domain with distinct C2-HC zinc-finger-like motif, categorized into IIIa/IIIb [8,15,19,20]. Group IV: truncated WRKYGQK motif lacking zinc co-ordination [20]. Despite structural elucidation of group I (AtWRKY4; Protein Data Bank (PDB) 2LEX) [21] and group III (OsWRKY45; PDB 6IR8) [22] complexes, group II WRKYs—particularly the agriculturally vital IId subgroup—remain structurally enigmatic.
In G. hirsutum, GhWRKY17 (alternatively GhWRKY16) emerges as a group IId transcriptional activator critical for fiber initiation and elongation [23,24]. It drives developmental programs by binding W-box elements in promoters of master regulators like GhHOX3, GhMYB109, and GhMYB25 [24–27]. Notably, GhHOX3 promotes the elongation of upland cotton fibers [26]. Meanwhile, MYB transcription factors GhMYB109 and GhMYB25 are crucial for fiber growth and fiber cell differentiation [24,25,27]. Therefore, by binding to W-box cis-acting elements in the promoters of these target genes, GhWRKY17 activates their transcription, thus facilitating both fiber initiation and elongation [23]. However, the structural determinants governing its DNA recognition specificity remain unresolved. By combining X-ray crystallography, biophysical profiling, and functional mutagenesis, we decrypt the molecular logic underlying GhWRKY17’s DNA recognition—revealing both conserved principles and subgroup-specific innovations that redefine our understanding of WRKY function.
Results and discussion
GhWRKY17 WRKY domain binds to W-Box sequences in target promoters
Previous studies established that GhWRKY17 binds to W-box sequences in the promoters of target genes such as GhHOX3 and GhMYB109 to regulate cotton fiber development [23] (Figure 1A, Supplementary Table S1). To confirm that this interaction is mediated by the WRKY domain of GhWRKY17 directly, we performed an isothermal titration calorimetry (ITC) assay with purified recombinant GhWRKY17 WRKY domain (residues 239–304) and three synthetic 12 bp W-box duplexes (HOX3-1, MYB109-1, and MYB109-3, Figure 1A). ITC measurements revealed micromolar-range binding affinities across all tested sequences, with dissociation constants (K d) ranging from 4.4 to 12 μM (Figure 1B). These comparable affinities demonstrate the WRKY domain’s intrinsic capacity for W-box recognition, independent of full-length protein context.
Figure 1. DNA-binding ability of GhWRKY17 WRKY domain.
(A) Domain architecture of GhWRKY17 and schematic of W-box elements in GhHOX3 and GhMYB109 promoters. (B) Isothermal titration calorimetry (ITC) curves for GhWRKY17 WRKY domain (residues 239–304) binding to 12 bp W-box duplexes (HOX3-1, MYB109-1, and MYB109-3). Dissociation constants (K d, μM) were calculated from heat differentials after buffer control subtraction. Data represent two independent replicates (MicroCal iTC-200).
Comparative analysis revealed evolutionary tuning of DNA-binding energetics that GhWRKY17’s affinity parallels group IIa AtWRKY18 (5–6 μM) [28] and group III OsWRKY45 (4.6 μM) [22], but is markedly weaker than group I AtWRKY1 (0.1 μM) [18]. This subgroup-dependent affinity hierarchy suggests functional specialization, where group I WRKYs may require stronger DNA binding for rapid response, while group II members like GhWRKY17 employ moderate affinity suited for developmental regulation.
Crystal structure of GhWRKY17 WRKY–dsDNA complex
To elucidate the molecular mechanism of DNA recognition by the transcription factor GhWRKY17, we attempted to crystallize its WRKY domain in complex with three W-box double-stranded DNA (dsDNA) sequences (HOX3-1, MYB109-1, and MYB109-2). Notably, we successfully determined the crystal structure of the GhWRKY17 WRKY domain bound to a 12 bp HOX3-1 W-box dsDNA at 1.8 Å resolution—the first reported complex structure for a group IId WRKY transcription factor and the first structure-solved WRKY transcription factor in G. hirsutum (Figures 2–3, Table 1).
Figure 2. Crystal structure of GhWRKY17 WRKY in complex with HOX3-1 W-box dsDNA.
(A) Crystal structure of GhWRKY17 WRKY dimer (gray/yellow) bound to W-box dsDNA (sense strand: salmon; antisense strand: cyan). Zinc ions (brown spheres) co-ordinate with conserved residues. Insets: zinc-binding site (left) and dimer interface (right). (B) Size-exclusion chromatography (SEC) profile (Superdex™ 75 pg) showing monomeric state of GhWRKY17 WRKY domain in solution (blue) versus protein standards (red, Bio-Rad). (C) Structural superposition of DNA-bound (gray) and DNA-free (yellow) WRKY domains.
Table 1. Data collection and refinement statistics.
| GhWRKY17 WRKY–GhHOX3 dsDNA | |
|---|---|
| PDB code | 9M0K |
| Data collection | |
| Space group | C2221 |
| Cell dimensions | |
| a, b, c (Å) | 54.1, 93.3, 83.7 |
| α, β, γ (◦) | 90, 90, 90 |
| Resolution (Å) | 46.81~1.80 (1.84~1.80) |
| Measured reflections | 37,707 (2085) |
| Unique reflections | 19,996 (1105) |
| R merge (%) | 0.039 (0.219) |
| I/σI | 20.4 (4.4) |
| CC1/2 | 0.997 (0.895) |
| Completeness (%) | 99.7 (96.2) |
| Redundancy | 1.9 (1.9) |
| Refinement | |
| Resolution (Å) | 46.81~1.80 (1.86~1.80) |
| R work/R free (%) | 24.6/27.9 |
| No. of atoms/average B-factors (Å2) | 1650/34.1 |
| Protein | 982/32.5 |
| Ions | 2/34.6 |
| Ligand | 486/35.5 |
| Water | 180/39.3 |
| Root mean square deviation | |
| Bond lengths (Å) | 0.01 |
| Bond angles (°) | 1.35 |
| Ramachandran plot % residues | |
| Favored | 100.00 |
Values in parentheses are for the highest resolution shell.
The asymmetric unit contains two WRKY monomers and one DNA duplex, with only one monomer engaging in direct DNA binding (Figure 2A). Each monomer adopts a canonical WRKY fold, consisting of four antiparallel β-strands: β2 (W249–K255; to maintain consistency with previously published structures, the β-strand bearing the conserved WRKY signature motif was named as β2), β3 (R264–C269), β4 (R278–A284), and β5 (M290–E296). A C2-H2 zinc-finger-like motif stabilizes the structure, co-ordinated by C269 and C275 (β3–β4 loop) and H299/H301 (C-terminal loop), with zinc-ligand distances within ideal ranges: 2.3 Å (Zn–C269), 2.3 Å (Zn–C275), 1.8 Å (Zn–H299), and 2.1 Å (Zn–H301) (Fig. 2A-1) [29]. Intriguingly, the two WRKY monomers form a homodimer via antiparallel β5–β5′ interactions, stabilized by hydrogen bonds and hydrophobic contacts among residues 291LIVTY295 (Figure 2A–2). However, size-exclusion chromatography (SEC) revealed a monomeric state in solution (Figure 2B), suggesting dimerization is either crystallization-induced or concentration-dependent—a phenomenon consistent with observations in AtWRKY18 [28] and other previous studies [30,31]. Notably, superposition of DNA-bound and free WRKY domains revealed minimal conformational changes (root mean square deviations ∼0.9 Å; Figure 2C), indicating a rigid-body binding mode without significant structural rearrangement upon DNA interaction.
Atomic-level interaction network between GhWRKY17 WRKY domain and W-box dsDNA
The crystal structure reveals a canonical WRKY–dsDNA binding geometry, with the β-sheet (β2–β5) orthogonally inserted into the DNA major groove (Figure 2A). Electrostatic surface analysis identified a positively charged cleft of the GhWRKY17 WRKY domain facilitating DNA engagement (Figure 3A). For systematic interaction mapping, we designated positions 1–12 (5′→3′) on the sense strand and 12′–1′ (5′→3′) on the antisense strand, with the core TGAC motif spanning positions 5–8 (sense) and 8′–5′ (antisense) (Figure 3B–C). The key interactions include the conserved WRKYGQK motif driving core recognition and the extended interaction network stabilizing the complex.
Figure 3. Interaction network of GhWRKY17 WRKY and HOX3-1 W-box dsDNA.
(A) Electrostatic surface potential of the GhWRKY17 WRKY–DNA complex (left) and rotated 90° view (right), highlighting DNA-binding cleft. (B) Detailed interactions between GhWRKY17 WRKY domain and HOX3-1 W-box dsDNA. The key residues of the WRKY domain and nucleotides of sense and antisense strands were shown as gray, salmon, and cyan sticks, respectively. Key hydrogen bonds, electrostatic interactions, and hydrophobic contacts were shown as red, orange, and blue dashed lines, respectively. Key water molecules were depicted as red spheres. (C) Schematic view of the interactions between the GhWRKY17 WRKY domain and HOX3-1 W-box dsDNA. Bases specifically recognized by the WRKY domain were highlighted in yellow, the core TGAC motif was highlighted by a green box, and binding involved direct and water-mediated hydrogen bonds, electrostatic interactions, and hydrophobic contacts were shown as solid red, dashed red, dashed orange, and dashed blue lines, respectively.
The signature 249WRKYGQK255 motif on β2 mediates critical DNA contacts. (1) W249 anchors the complex via water-bridged hydrogen bonds between its backbone carbonyl and A3 phosphate (Figure 3B-1, 3C, and Supplementary Table S2). (2) R250 exhibits dual conformational states: state 1, the guanidinium group forms several direct/water-mediated hydrogen bonds with A3 (purine), T4 (pyrimidine), and A4′ (purine); and state 2, the guanidinium group interacts with the G2 phosphate via direct/water-mediated hydrogen bonds and electrostatic interactions (Figure 3B-1, 3C, and Supplementary Table S2). (3) K251 engages T4 phosphate through its side chain, while backbone groups contact A3 phosphate and T4 pyrimidine (Figure 3B-1, 3C, and Supplementary Table S2). (4) Y252 hydroxyl directly binds to T7′ phosphate, with the main chain carbonyl recognizing amine groups of A5′ and C6′ (Figure 3B-1, 3C, and Supplementary Table S2). (5) Q254 co-ordinates T5 phosphate via its side chain, while backbone interactions stabilize G6 and T7′ (Figure 3B–3C, and Supplementary Table S2). (6) K255 bridges G8′, A9′, and A7 through multipoint recognition (Figure 3B–3C, and Supplementary Table S2). In addition to these major interactions, the side chains of K258/R264/Y266/Y267/K268/R278 and the main chains of P256/K258 form hydrogen bonds and electrostatic interactions with A3 and C6′–T10′ phosphates, enhancing binding ability (Figures 3B–4, B–5 and 3C, and Supplementary Table S2). Furthermore, hydrophobic contacts between 251KYGQK255/Y266 and T4/T5/T7′/G8′ bases further enforce sequence selectivity (Figures 3B–6, 3C, and Supplementary Table S2), consistent with WRKY family mechanisms [18,21,22,28].
In summary, the interaction landscape identifies three co-operative elements that the β2 strand (250RKYGQK255) orchestrates core TGAC recognition, β3 strand (R264/Y266/Y267/K268) stabilizes flanking regions, and adjacent residues (K258/R278) fine-tune phosphate contacts. This multivalent binding strategy explains GhWRKY17’s specificity for the bipartite 3ATTGACT9 motif across both DNA strands. The integration of conformational plasticity (R250), water-mediated bridging, and modular domain contributions exemplifies evolutionary optimization for developmental gene regulation.
Mutagenesis reveals critical binding determinants
To identify the key residues involved in DNA binding, we generated point mutations in the GhWRKY17 WRKY domain and quantitatively assessed their binding affinities using ITC. Strikingly, alanine substitutions at residues Y252, Q254, K255, R264, Y266, Y267, and R278 completely disrupt DNA binding (Figure 4A; Supplementary Figure S1), highlighting their indispensable role in the interaction. Structural analyses revealed that these residues directly engage with the W-box core sequence through side-chain interactions, where mutations disrupt critical hydrogen bonds, electrostatic interactions, and hydrophobic contacts with DNA nucleotides (Figure 3B–C). Furthermore, mutations of P256, K251, R250, and W249 to alanine nearly eliminate the DNA binding (Figure 4A; Supplementary Figure S1), underscoring their substantial contribution to binding stability (Figure 3). In contrast, substitutions at K258 and K268 only moderately reduce binding affinity (3 and 10 times weaker compared with the wildtype, Figure 4A; Supplementary Figure S1), suggesting their auxiliary roles in the interaction.
Figure 4. Mutagenesis reveals critical binding determinants.
(A) Dissociation constants (K d, μM) of HOX3-1 W-box dsDNA to wildtype and mutant GhWRKY17 WRKY domains determined by ITC. Shown are representative of two independent experiments. (B) Binding ability of 16 bp wildtype (core TGAC in red) and mutant HOX3-1 W-box dsDNA (mutated bases in red lowercase) to GhWRKY17 WRKY domain determined by electrophoretic mobility shift assays (EMSA). Only the sequence of the sense strand was shown with the serial number above the wildtype sequence kept consistent with the 12-mer DNA sequence used in the complex structure for clear understanding.
Notably, our findings align with prior mutagenesis studies of analogous residues in other WRKY domains, with a few exceptions [18,32,33]. For example, while the Q254A mutation in GhWRKY17 (corresponding to Q121A in AtWRKY1 N-WRKY) abrogates DNA binding, the equivalent Q317A mutation in AtWRKY1 C-WRKY retains binding activity [18,33]. These discrepancies highlight both conserved and divergent functional features among WRKY domains, emphasizing the necessity for subgroup-specific investigations.
Sequence-specific recognition of the W-box
To investigate the functional significance of specific nucleotides within the W-box sequence, we performed electrophoretic mobility shift assays (EMSA) using synthetic 16 bp wildtype or mutant DNAs and the wildtype GhWRKY17 WRKY domain (Supplementary Table S1). Consistent with our structural data, mutations in the core TGAC sequence (Mut1–Mut12) completely disrupt DNA binding (Figure 4B; Supplementary Figure S2A), confirming the indispensability of this motif, as previously reported for group IIb (AtWRKY6) and IId (AtWRKY11) WRKY proteins [32]. Intriguingly, this contrasts with group IIc (AtWRKY43), group I (AtWRKY26), and group III (AtWRKY38) proteins, which exhibit divergent nucleotide preferences [32].
To dissect flanking nucleotide roles, we analyzed mutations adjacent to the core TGAC sequence (Figure 4B; Supplementary Figure S2B). (1) Position T9 (3′ to TGAC): Mut13/14 (T9→G/A, corresponding to A9′→C/T) abolishes binding due to disrupted hydrogen bonds between K255 and the purine ring of A9′ (Figure 3B–C; Supplementary Figure S3A). Mut15 (T9→C, corresponding to A9′→G) enhances binding (Figure 4B; Supplementary Figure S2B), likely due to structural similarities between adenine and guanine and additional hydrogen bonds mediated by the carbonyl group of guanine with the side chain of K255 (Figure 3B–C; Supplementary Figure S3A). These data indicate that the GhWRKY17 WRKY domain prefers a base C to T at the position 9, the 3′ adjacent to the core TGAC sequence. This finding contrasts with AtWRKY6/AtWRKY11 (preferring T significantly) and AtWRKY43 (requiring C absolutely) [32], but is comparable to AtWRKY18 (no C/T bias) [28].
(2) Position T4 (5′ to TGAC): Mut16–18 (T4→C/G/A) eliminates binding (Figure 4B; Supplementary Figure S2B), likely due to disrupted hydrogen bonds (side chain of R250 with T4’s pyrimidine ring) and hydrophobic interactions (K251/Y252 with T4’s methyl group) (Figure 3B–C; Supplementary Figure S3B), highlighting T4 as another critical nucleotide for the specific interaction. This mirrors AtWRKY6/AtWRKY11 but opposes AtWRKY43/AtWRKY26/AtWRKY38, which favor T→G/A substitutions [32].
(3) Position A3: Mut19/21 (A3→G/T) enhances binding (Figure 4B; Supplementary Figure S2B), suggesting that A3 plays an auxiliary role in stabilizing interactions and the GhWRKY17 WRKY domain prefers G or T at this site (Figure 3B–C; Supplementary Figure S3C). This diverges from AtWRKY6/AtWRKY11 (G-specific) but resembles AtWRKY43/AtWRKY26/AtWRKY38 (no significant nucleotide preference) [32].
Collectively, our data reveal that GhWRKY17 WRKY preferentially binds to the W-box sequence G/TTTGACC, with distinct nucleotide requirements at flanking positions compared with other WRKY subgroups. These findings underscore the functional divergence among WRKY transcription factors and emphasize the necessity for subgroup-specific structural studies to elucidate their DNA recognition mechanisms.
Structural and functional divergence among WRKY subgroups
The GhWRKY17 WRKY–GhHOX3 DNA complex structure represents two major progresses—the first resolved WRKY transcription factor complex in G. hirsutum (upland cotton) and the inaugural structural characterization of a group IId WRKY–DNA interaction. Currently, six WRKY–DNA complexes are available in the PDB, including four group I structures (AtWRKY4 C-WRKY, AtWRKY1 N-WRKY, AtWRKY2 N-WRKY, and AtWRKY33 N-WRKY), one group IIa (AtWRKY18), and one group III (OsWRKY45) [18,21,22,28]. Sequence and structural analyses revealed both conserved and divergent features among WRKY domains. Sequence alignment demonstrated that the amino acid residues mediating interactions between the GhWRKY17 WRKY domain and GhHOX3 DNA are conserved across other WRKY domains (Figure 5A). Structural superimposition further confirmed this conservation, despite variations in β-sheet composition (four or five antiparallel β strands), with the signature WRKYGQK motif consistently positioned on the outermost β2 strand (Figure 5B).
Figure 5. Divergence of WRKY–DNA interactions.
(A) Structure-guided sequence alignment of selected WRKY domains (group IId GhWRKY17, group IIa AtWRKY18, group I AtWRKY1 N-WRKY, group I AtWRKY4 C-WRKY, and group III OsWRKY45). Key DNA interaction residues of GhWRKY17 WRKY were shown as purple★ (essential), blue ✦ (necessary), and green ▲ (auxiliary), respectively. The residues involved in specific base recognition were marked with blue underlines. Missing β1 strand in GhWRKY17 WRKY was indicated by a dashed arrow. (B) Structural superposition of selected WRKY domains, GhWRKY17 (gray), AtWRKY18 (pink), AtWRKY1 N-WRKY (golden), AtWRKY4 C-WRKY (green), and OsWRKY45 (violet and deep purple). (C–E) Dimerization interfaces of GhWRKY17 (β5–β5′, C), AtWRKY18 (β1–β1′, D), and OsWRKY45 (β4–β5 swap + Zn²+, E). (F–J) Base specific recognition of the TGAC motif by different WRKY domains. The specific interaction involved residues and nucleotides are colored in blue and red, respectively, and the core TGAC motif was highlighted by a green box. Gh, Gossypium hirsutum (upland cotton); At, Arabidopsis thaliana (mouse-ear cress); Os, Oryza sativa (rice).
Notably, the crystal structure of the GhWRKY17 WRKY domain complexed with GhHOX3 DNA reveals a dimeric configuration (Figure 2A). Comparative analysis with existing structures shows that while group IIa (AtWRKY18) and group III (OsWRKY45) WRKY transcription factors also exhibit dimeric states, group I members do not (Figure 5C–E) [18,22,28]. However, the dimerization mechanism varies among subgroups. GhWRKY17 forms dimers through hydrogen bonds between β5 and β5′ strands (Figure 5C), while AtWRKY18 utilizes β1 and β1′ interactions (Figure 5D) [28]. In contrast, OsWRKY45 exhibits a unique domain-swapping stabilized by zinc ions (Figure 5E) [22]. SEC analyses revealed that while GhWRKY17 and AtWRKY18 WRKY domains exist as monomers in solution, OsWRKY45 maintains an oligomeric state. This dynamic oligomerization implies weaker interdomain interactions in group II WRKYs compared with the zinc-stabilized quaternary structure of group III members.
DNA recognition analyses across WRKY subgroups revealed both conserved and divergent features. All studied WRKY domains employ a β-sheet insertion into the DNA major groove, with the β2 strand (bearing the WRKYGQK motif) universally engaging the TGAC core (Figure 5F–J) [18,21,22,28]. Structural comparisons highlight the crucial role of conserved tyrosine (Y) and the second lysine (K) in the heptapeptide sequence for W-box recognition (Figure 5A). However, further investigation uncovered that GhWRKY17 WRKY exhibits two distinctive features. It recognizes the TGAC motif on both sense and antisense strands, in contrast to other WRKYs that primarily interact with the TGAC complementary sequence on the antisense strand (Figure 5F–J, specific interaction involved nucleotides highlighted in red), and its DNA binding predominantly relies on hydrogen bonding, differing from the extensive hydrophobic interactions observed in other subgroups (Figure 3, Supplementary Figure S4), [18,21,22,28]. Notably, the specific multipoint recognition of bases G6 and A7 represents a novel feature first reported in the GhWRKY17–DNA complex (Figure 5F–J, Supplementary Figure S4). Mutagenesis studies corroborated these structural observations, showing differential contributions of conserved residues and nucleotides to WRKY–DNA interaction across subgroups (Figure 4) [18,21,22,28,32–34].
These findings highlight an evolutionary paradox. While WRKY domains maintain a conserved DNA-binding scaffold, subgroup-specific variations in oligomerization strategies, secondary structure utilization, and interaction chemistries (hydrogen bonds vs. hydrophobic interactions) suggest functional specialization. Such structural plasticity likely enables fine-tuning of DNA recognition mechanisms to fulfill distinct regulatory roles in plant development and stress responses.
Conclusions
In this study, we have deciphered the molecular basis of DNA recognition by the WRKY domain of GhWRKY17, establishing the first structural framework for group IId WRKY transcription factors. Through integrated structural and mutagenesis analyses, we demonstrate that GhWRKY17 exhibits unique sequence specificity, preferentially binding to the G/TTTGACC motif through selective recognition of nucleotides flanking the core TGAC element. This discovery extends previous functional observations showing GhWRKY17’s critical role in cotton fiber development via promoter W-box interactions [23]. Comparative analyses with other WRKY–DNA complexes reveal an intriguing evolutionary paradigm. While all WRKY domains share a conserved β-sheet DNA-binding scaffold, subgroup-specific variations in sequence recognition strategies and interaction chemistries (e.g. hydrogen bonding vs. hydrophobic contacts) underscore functional diversification. Notably, GhWRKY17’s dual-strand engagement and extensively hydrogen bond-mediated specific interaction represent novel mechanistic features distinguishing group IId members from other WRKY subgroups. These findings not only advance our understanding of cotton fiber morphogenesis at the molecular level but also provide a structural blueprint for precision breeding strategies. The identified DNA-binding specificity could guide targeted promoter engineering to optimize GhWRKY17-regulated pathways, potentially enhancing fiber yield and quality. Future studies should address the structural basis of WRKY subgroup specialization through high-resolution characterization of diverse family members, which may uncover universal principles governing plant transcriptional regulation.
Materials and methods
Protein expression and purification
The coding sequence for the GhWRKY17 WRKY domain (residues 239–304) was cloned into a pET28-MHL vector (Addgene; cat. 26096) to generate an N-terminal 6 × His-TEV-tagged construct using seamless assembly cloning (ABclonal Technology; cat. RK21020). Sequence-verified plasmids (Azenta Life Sciences) were transformed into E. coli BL21(DE3) Codon Plus RIL cells (TransGen; cat. CD601). Protein expression was induced with 0.25 mM IPTG at OD600=0.8, followed by incubation at 15°C for 24 h. Cells were lysed in buffer (20 mM Tris-HCl pH 7.5, 250 mM NaCl, 5% glycerol, 5 mM β-mercaptoethanol), and the lysate was purified via Ni-nitrilotriacetate (Ni-NTA) affinity chromatography (GE Healthcare; cat. 17526802) using stepwise imidazole gradients (wash: 40 mM; elution: 250 mM). The His-tag was cleaved by TEV protease during dialysis in 20 mM Tris-HCl pH 7.5 and 150 mM NaCl, and tag-free protein was further purified by SEC (Superdex™ 75 pg, GE Healthcare; cat. 28989335) in 20 mM Tris-HCl pH 7.5, 150 mM NaCl, and 1 mM DTT and concentrated using Amicon Ultra-15 Centrifugal Filter Units (Millipore Corporation; cat. UFC901024). Site-directed mutants were generated using the Fast Mutagenesis System (TransGen; cat. FM111-02), with sequences confirmed by Sanger sequencing (Azenta Life Sciences). Mutant proteins were overexpressed and purified as the wildtype protein described above.
DNA preparation
Synthetic oligonucleotides containing W-box sequences from GhHOX3 (HOX3-1) and GhMYB109 (MYB109-1, MYB109-3) promoters (Synbio Technologies) were annealed in 20 mM Tris-HCl pH 7.5, 150 mM NaCl by heating to 95°C (3 min) followed by gradual cooling to 4°C.
Isothermal titration calorimetry (ITC)
ITC experiments were performed on an iTC-200 microcalorimeter (Malvern Panalytical) at 25°C. The concentrated GhWRKY17 WRKY proteins and W-box dsDNAs were diluted into 20 mM Tris-HCl pH 7.5, 150 mM NaCl (ITC buffer). GhWRKY17 WRKY (50 μM in the cell chamber) was titrated with 750 μM 12 bp dsDNA (in the syringe) in 20 successive injections with a spacing of 150 s. Control experiments were performed under identical conditions to determine the heat signals that arise from injection of the dsDNA into the buffer (buffer dilution heats). Data were corrected for buffer dilution heats and analyzed using a single-site binding model in Origin 7.0 (MicroCal).
Crystallization and structure determination
The GhWRKY17 WRKY: HOX3-1 W-box dsDNA complex (1:1.2 molar ratio) was crystallized at 18°C via sitting-drop vapor diffusion (0.5 μl protein/DNA mixture + 0.5 μl reservoir: 20% PEG 3350, 0.2 M ammonium formate) with a protein concentration of 6 mg/ml. X-ray diffraction data were collected at Canadian Light Source (CLS) 08ID and Shanghai Synchrotron Radiation Facility (SSRF) BL10U2, BL02U1 (λ=0.978 Å, 100 K). Diffraction images were processed using autoPROC [35] and XDS [36]. The structure was solved by molecular replacement with program PHASER [37] and co-ordinates from PDB entry 2AYD (apo structure of AtWRKY1-C WRKY domain) [33]. The DNA model was built manually in Coot [38] by fitting ideal B-form DNA fragments into the clearly defined electron density, followed by iterative refinement of base pairing and backbone geometry. The complex structure was further refined with REFMAC [39], PHENIX [40], and validated with MOLPROBITY [41].
Electrophoretic mobility shift assay (EMSA)
Binding reactions (15 μl: 32 μM WRKY domain, 8 μM 16 bp dsDNA in 10 mM Tris-HCl pH 7.5, 50 mM NaCl, 1 mM EDTA, 10% glycerol, 1 mM DTT) were incubated on ice (30 min), resolved on 6% native polyacrylamide gels in 0.5 × Tris-Borate-EDTA (TBE) buffer at 100 V for 90 min, and visualized with SYBR™ Gold (Thermo Fisher; cat. S11494).
Size-exclusion chromatography (SEC) analysis
SEC was performed on a Superdex™ 75 column (GE Healthcare; cat. 28989335) pre-calibrated with gel filtration standards (Bio-Rad; cat. 1511901) in 20 mM Tris-HCl pH 7.5, 150 mM NaCl, and 1 mM DTT.
Supplementary material
Acknowledgments
We acknowledge Wolfram Tempel for data collection and structure determination, and the staff of beamline BL10U2, BL02U1 at the Shanghai Synchrotron Radiation Facility for assistance during the data collection. We thank the staff members of the Large-scale Protein Preparation System at the National Facility for Protein Science in Shanghai (NFPS), Shanghai Advanced Research Institute, Chinese Academy of Science, China, for providing technical support and assistance in data collection and analysis.
Abbreviations
- At
Arabidopsis thaliana (mouse-ear cress)
- C2-H2
zinc‑finger motif with two cysteines and two histidines
- C-WRKY
C-terminal WRKY domain
- EMSA
electrophoretic mobility shift assays
- Gh
Gossypium hirsutum (upland cotton)
- ITC
isothermal titration calorimetry
- N-WRKY
N-terminal WRKY domain
- Os
Oryza sativa (rice)
- PDB
Protein Data Bank
- SEC
size-exclusion chromatography
- TEV
tobacco etch virus
- dsDNA
double-stranded DNA
Data Availability
The atomic co-ordinates and structure factors have been deposited in the PDB (accession code: 9M0K [42]). Additional data are available in the Supplementary Information or from the authors upon request.
Competing Interests
The authors declare that they have no conflicts of interest with the contents of this article.
Funding
This work was supported by the National Natural Science Foundation of China (32271309), a project funded by the Priority Academic Program Development (PAPD) of Jiangsu Higher Education Institutions, and NSERC grant (RGPIN-2021-02728).
Open Access
This article has been published open access under our Subscribe to Open programme, made possible through the support of our subscribing institutions, learn more here: https://portlandpress.com/pages/open_access_options_and_prices#conditional
CRediT Author Contribution
Q.X. and Y.W. purified and crystallized the protein; X.S. determined the structure; Q.X. constructed the mutant plasmids with the help of M.Z. and G.X.; Y.C., Q.X., and Y.Z. conducted the ITC assays with the help of X.H.; J.M. and S.Q. reviewed the crystallographic model; Y.L. conceived and designed the study and wrote the paper with substantial contributions from all the other authors. All authors contributed to data analysis and approved the final version of the manuscript.
References
- 1. Liu L., White M.J., MacRae T.H Transcription factors and their genes in higher plants functional domains, evolution and regulation. Eur. J. Biochem. 1999;262:247–257. doi: 10.1046/j.1432-1327.1999.00349.x. [DOI] [PubMed] [Google Scholar]
- 2. Iida K., Seki M., Sakurai T., Satou M., Akiyama K., Toyoda T., et al. RARTF: database and tools for complete sets of Arabidopsis transcription factors. DNA Res. 2005;12:247–256. doi: 10.1093/dnares/dsi011. [DOI] [PubMed] [Google Scholar]
- 3. Xiong Y., Liu T., Tian C., Sun S., Li J., Chen M Transcription factors in rice: a genome-wide comparative analysis between monocots and eudicots. Plant Mol. Biol. 2005;59:191–203. doi: 10.1007/s11103-005-6503-6. [DOI] [PubMed] [Google Scholar]
- 4. Aukerman M.J., Schmidt R.J., Burr B., Burr F.A An arginine to lysine substitution in the bZIP domain of an opaque-2 mutant in maize abolishes specific DNA binding. Genes Dev. 1991;5:310–320. doi: 10.1101/gad.5.2.310. [DOI] [PubMed] [Google Scholar]
- 5. Huang H., Tudor M., Su T., Zhang Y., Hu Y., Ma H DNA binding properties of two Arabidopsis MADS domain proteins: binding consensus and dimer formation. Plant Cell. 1996;8:81–94. doi: 10.1105/tpc.8.1.81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Javed T., Shabbir R., Ali A., Afzal I., Zaheer U., Gao S.J Transcription factors in plant stress responses: challenges and potential for sugarcane improvement. Plants (Basel). 2020;9:491. doi: 10.3390/plants9040491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Agarwal P., Reddy M.P., Chikara J WRKY: its structure, evolutionary relationship, DNA-binding selectivity, role in stress tolerance and development of plants. Mol. Biol. Rep. 2011;38:3883–3896. doi: 10.1007/s11033-010-0504-5. [DOI] [PubMed] [Google Scholar]
- 8. Rushton P.J., Somssich I.E., Ringler P., Shen Q.J WRKY transcription factors. Trends Plant Sci. 2010;15:247–258. doi: 10.1016/j.tplants.2010.02.006. [DOI] [PubMed] [Google Scholar]
- 9. Eulgem T., Somssich I.E Networks of WRKY transcription factors in defense signaling. Curr. Opin. Plant Biol. 2007;10:366–371. doi: 10.1016/j.pbi.2007.04.020. [DOI] [PubMed] [Google Scholar]
- 10. Ross C.A., Liu Y., Shen Q.J The WRKY Gene Family in Rice (Oryza sativa) J. Integr. Plant Biol. 2007;49:827–842. doi: 10.1111/j.1744-7909.2007.00504.x. [DOI] [Google Scholar]
- 11. Wang N.N., Xu S.W., Sun Y.L., Liu D., Zhou L., Li Y., et al. The cotton WRKY transcription factor (GhWRKY33) reduces transgenic Arabidopsis resistance to drought stress. Sci. Rep. 2019;9:724. doi: 10.1038/s41598-018-37035-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Yang Y., Chi Y.J., Wang Z., Zhou Y., Fan B.F., Chen Z.X Functional analysis of structurally related soybean GmWRKY58 and GmWRKY76 in plant growth and development. J. Exp. Bot. 2016;67:4727–4742. doi: 10.1093/jxb/erw252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Rushton P.J., Torres J.T., Parniske M., Wernert P., Hahlbrock K., Somssich I.E Interaction of elicitor-induced DNA-binding proteins with elicitor response elements in the promoters of parsley PR1 genes. EMBO J. 1996;15:5690–5700. doi: 10.1002/j.1460-2075.1996.tb00953.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Javed T., Gao S.J WRKY transcription factors in plant defense. Trends Genet. 2023;39:787–801. doi: 10.1016/j.tig.2023.07.001. [DOI] [PubMed] [Google Scholar]
- 15. Eulgem T., Rushton P.J., Robatzek S., Somssich I.E The WRKY superfamily of plant transcription factors. Trends Plant Sci. 2000;5:199–206. doi: 10.1016/s1360-1385(00)01600-9. [DOI] [PubMed] [Google Scholar]
- 16. De Pater S., Greco V., Pham K., Memelink J., Kijne J Characterization of a zinc-dependent transcriptional activator from Arabidopsis. Nucleic Acids Res. 1996;24:4624–4631. doi: 10.1093/nar/24.23.4624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Eulgem T., Rushton P.J., Schmelzer E., Hahlbrock K., Somssich I.E Early nuclear events in plant defence signalling: rapid gene activation by WRKY transcription factors. EMBO J. 1999;18:4689–4699. doi: 10.1093/emboj/18.17.4689. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Xu Y.P., Xu H., Wang B., Su X.D Crystal structures of N-terminal WRKY transcription factors and DNA complexes. Protein Cell. 2020;11:208–213. doi: 10.1007/s13238-019-00670-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Goyal P., Devi R., Verma B., Hussain S., Arora P., Tabassum R., et al. WRKY transcription factors: evolution, regulation, and functional diversity in plants. Protoplasma. 2023;260:331–348. doi: 10.1007/s00709-022-01794-7. [DOI] [PubMed] [Google Scholar]
- 20. Li Z., Hua X., Zhong W., Yuan Y., Wang Y., Wang Z., et al. Genome-wide identification and expression profile analysis of WRKY Family Genes in the Autopolyploid Saccharum spontaneum. Plant Cell Physiol. 2020;61:616–630. doi: 10.1093/pcp/pcz227. [DOI] [PubMed] [Google Scholar]
- 21. Yamasaki K., Kigawa T., Watanabe S., Inoue M., Yamasaki T., Seki M., et al. Structural basis for sequence-specific DNA recognition by an Arabidopsis WRKY transcription factor. J. Biol. Chem. 2012;287:7683–7691. doi: 10.1074/jbc.M111.279844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Cheng X., Zhao Y., Jiang Q., Yang J., Zhao W., Taylor I.A., et al. Structural basis of dimerization and dual W-box DNA recognition by rice WRKY domain. Nucleic Acids Res. 2019;47:4308–4318. doi: 10.1093/nar/gkz113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Wang N.-N., Li Y., Chen Y.-H., Lu R., Zhou L., Wang Y., et al. Phosphorylation of WRKY16 by MPK3-1 is essential for its transcriptional activity during fiber initiation and elongation in cotton (Gossypium hirsutum) Plant Cell. 2021;33:2736–2752. doi: 10.1093/plcell/koab153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Machado A., Wu Y., Yang Y., Llewellyn D.J., Dennis E.S The MYB transcription factor GhMYB25 regulates early fibre and trichome development. Plant J. 2009;59:52–62. doi: 10.1111/j.1365-313X.2009.03847.x. [DOI] [PubMed] [Google Scholar]
- 25. Pu L., Li Q., Fan X., Yang W., Xue Y The R2R3 MYB transcription factor GhMYB109 is required for cotton fiber development. Genetics. 2008;180:811–820. doi: 10.1534/genetics.108.093070. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Shan C.-M., Shangguan X.-X., Zhao B., Zhang X.-F., Chao L.-M., Yang C.-Q., et al. Control of cotton fibre elongation by a homeodomain transcription factor GhHOX3. Nat. Commun. 2014;5:5519. doi: 10.1038/ncomms6519. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Walford S.A., Wu Y., Llewellyn D.J., Dennis E.S GhMYB25-like: a key factor in early cotton fibre development. Plant J. 2011;65:785–797. doi: 10.1111/j.1365-313X.2010.04464.x. [DOI] [PubMed] [Google Scholar]
- 28. Grzechowiak M., Ruszkowska A., Sliwiak J., Urbanowicz A., Jaskolski M., Ruszkowski M New aspects of DNA recognition by group II WRKY transcription factor revealed by structural and functional study of AtWRKY18 DNA binding domain. Int. J. Biol. Macromol. 2022;213:589–601. doi: 10.1016/j.ijbiomac.2022.05.186. [DOI] [PubMed] [Google Scholar]
- 29. Simonson T., Calimet N Cys(x)His(y)-Zn2+ interactions: thiol vs. thiolate coordination. Proteins. 2002;49:37–48. doi: 10.1002/prot.10200. [DOI] [PubMed] [Google Scholar]
- 30. Liu Y., Qin S., Chen T.-Y., Lei M., Dhar S.S., Ho J.C., et al. Structural insights into trans-histone regulation of H3K4 methylation by unique histone H4 binding of MLL3/4. Nat. Commun. 2019;10:36. doi: 10.1038/s41467-018-07906-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Liu Y., Tempel W., Zhang Q., Liang X., Loppnau P., Qin S., et al. Family-wide Characterization of Histone Binding Abilities of Human CW Domain-containing Proteins. Journal of Biological Chemistry. 2016;291:9000–9013. doi: 10.1074/jbc.M116.718973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Ciolkowski I., Wanke D., Birkenbihl R.P., Somssich I.E Studies on DNA-binding selectivity of WRKY transcription factors lend structural clues into WRKY-domain function. Plant Mol. Biol. 2008;68:81–92. doi: 10.1007/s11103-008-9353-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Duan M.R., Nan J., Liang Y.H., Mao P., Lu L., Li L., et al. DNA binding mechanism revealed by high resolution crystal structure of Arabidopsis thaliana WRKY1 protein. Nucleic Acids Res. 2007;35:1145–1154. doi: 10.1093/nar/gkm001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Yamasaki K., Kigawa T., Inoue M., Tateno M., Yamasaki T., Yabuki T., et al. Solution structure of an Arabidopsis WRKY DNA binding domain. Plant Cell. 2005;17:944–956. doi: 10.1105/tpc.104.026435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Vonrhein C., Flensburg C., Keller P., Sharff A., Smart O., Paciorek W., et al. Data processing and analysis with the autoPROC toolbox. Acta Crystallogr. D Biol. Crystallogr. 2011;67:293–302. doi: 10.1107/S0907444911007773. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Kabsch W XDS. Acta Crystallogr. D Biol. Crystallogr. 2010;66:125–132. doi: 10.1107/S0907444909047337. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. McCoy A.J., Grosse-Kunstleve R.W., Adams P.D., Winn M.D., Storoni L.C., Read R.J Phaser crystallographic software. J. Appl. Crystallogr. 2007;40:658–674. doi: 10.1107/S0021889807021206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Emsley P., Lohkamp B., Scott W.G., Cowtan K Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 2010;66:486–501. doi: 10.1107/S0907444910007493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Murshudov G.N., Skubák P., Lebedev A.A., Pannu N.S., Steiner R.A., Nicholls R.A., et al. REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr. D Biol. Crystallogr. 2011;67:355–367. doi: 10.1107/S0907444911001314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Adams P.D., Afonine P.V., Bunkóczi G., Chen V.B., Davis I.W., Echols N., et al. PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallogr. D Biol. Crystallogr. 2010;66:213–221. doi: 10.1107/S0907444909052925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Chen V.B., Arendall W.B. 3rd, Headd J.J., Keedy D.A., Immormino R.M., Kapral G.J., et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 2010;66:12–21. doi: 10.1107/S0907444909042073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Liu Y., Xiao Q., Shang X. RCSB Protein Data Bank; 2025. Crystal structure of the WRKY DNA-binding domain in complex with the W-box DNA motif.https://www.rcsb.org/structure/9M0K [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The atomic co-ordinates and structure factors have been deposited in the PDB (accession code: 9M0K [42]). Additional data are available in the Supplementary Information or from the authors upon request.





