Abstract
C2H2 zinc-finger (ZF) proteins form the largest family of DNA-binding transcription factors coded by mammalian genomes. In a typical DNA-binding ZF module, there are twelve residues (numbered from −1 to −12) between the last zinc-coordinating cysteine and the first zinc-coordinating histidine. The established C2H2-ZF “recognition code” suggests that residues at positions −1, −4, and −7 recognize the 5′, central, and 3′ bases of a DNA base-pair triplet, respectively. Structural studies have highlighted that additional residues at positions −5 and −8 also play roles in specific DNA recognition. The presence of bulky and either charged or polar residues at these five positions determines specificity for given DNA bases: guanine is recognized by arginine, lysine, or histidine; adenine by asparagine or glutamine; thymine or 5-methylcytosine by glutamate; and unmodified cytosine by aspartate. This review discusses recent structural characterizations of C2H2-ZFs that add to our understanding of the principles underlying the C2H2-ZF recognition code.
Keywords: C2H2 zinc fingers, transcription factors, protein-DNA interactions, DNA sequence-specific recognition
Introduction
DNA–protein interactions involving transcription factors (TFs) are essential for appropriate gene expression across the biological world [1-3], and they are gaining attention as druggable targets [4,5]. TFs act by binding at specific genomic locations. In eukaryotes, such binding is typically at gene promoters and/or enhancers, with some TFs binding only nucleosome-free regions and others capable of triggering alterations in chromatin structure [6,7]. To function properly, TFs recognize not only nucleosome content and modifications but also DNA supercoiling/bending [8,9] and epigenetic DNA modifications [10,11]. In mouse and human genomes, there are ~1500 annotated sequence-specific DNA-binding TFs [12,13]. Among these, three structural classes of DNA-binding proteins account for most TFs: Cys2-His2 (C2H2) zinc-finger (ZF) proteins (~700), homeodomain proteins (~250), and helix–loop–helix proteins (~100). Here, we focus on the C2H2 ZF-containing TFs and how they recognize specific DNA sequences.
Conventional C2H2 ZFs are named for the Zn atom being coordinated by two Cys and two His residues, forming a Cys2–Zn–His2 tetrahedron that stiffens the fingers (Figure 1a) [14-17]. The term “fingers” has also been used for really interesting new gene (typically E3 ubiquitin ligases), plant homeodomain (typically histone lysine modification readers), Lin-ll, Isl-1, and Mec-3 (protein–protein interaction domains), and CCCH ZF RNA-binding proteins, among others [18-20]. These fingers differ in the number of zinc ions coordinated by different ligand combinations of Cys and His residues. Zn metalloproteins can also use Glu, Asp, Ser, and Thr to coordinate Zn [21,22], but to date, none of those four has been found to play this role in ZF proteins, probably reflecting the fact that, at physiological pH values, Cys and His have the highest affinities for Zn [23].
Figure 1. The canonical model of C2H2 ZF interaction with DNA triplet element.

(a) Example of a C2H2 ZF module. The three residues at canonical positions of −1, −4, and −7 face the DNA. Two cysteine and two histidine residues in each finger (C2H2; in stick models) are responsible for one Zn2+-ion binding. [Residue numbering shown on the blue secondary structure diagram is described in panel c] (b) Schematic representation of a single ZF unit typically bound to three or four adjacent DNA base pairs via major groove contacts. The DNA sequence of the recognition strand (bottom in green) is oriented right to left from 5′ to 3′. The complementary strand (top) is colored in gray. (c) The protein sequence is from N-to-C termini (left-to-right) and amino acids at positions −1, −4, and −7 (highlighted) relative to the first Zn-associated histidine interact specifically with the DNA bases shown above. The protein secondary structures are shown below the sequence, with arrows for β strands, lines for loops, and cylinder for α helix. The traditional structure-based numbering at −1, +2, +3, and +6 (relative to the start of the α-helix) is provided for comparison. (d) A rare example of cross-finger zinc coordination in Arabidopsis thaliana REF6. (e) A set of 156 C2H2 ZFs was obtained from https://genexplain.com/tfclass/Class%202.3_alignment.html. The ZFs were from families 2.3.1 and 2.3.2. (Left) Distribution of pI vs. MW for each ZF, with the mean ± SD shown in red. For comparison, the 11 ZFs of human CTCF are shown as blue diamonds. (Right) The logos were generated using WebLogo3.0, after grouping the ZFs by pI as >1 SD above or below the mean, and those within 1 SD of the mean. The Zn ligands (2 Cys and 2 His) are invariant. The base recognition residues are indicated by vertical arrows. (f) Distribution of recognition amino acids at positions −7 (yellow), −4 (orange), and −1 (red) in this set are shown. (g) Logo analysis [86] of the ZF elements having the three most-frequent specificities (KHA, RHK, and RER at −7/−4/−1; vertical arrows). Together, these account for 66 of the 156 ZFs (42%). BHX (basic-His-any amino acid) should recognize the DNA sequence NGG; and RER (Arg-Glu-Arg) should recognize DNA sequence GMG, where M is Tor 5-methylC [40]. KHA triplet examples included ZF1 of Sp1 and ZF1 of KLF4. RHK triplet examples included ZF3 of Sp1 and ZF3 of KLF9. RER triplet examples included ZF2 of Sp1; ZF2 of KLF4; and both ZF1 and ZF3 of EGR1. (h) An (incomplete) list of recent C2H2 ZF-DNA complex structures discussed in this review. Abbreviations: CTCF = CCCTC-binding factor; MW = molecular weight (mass); ZF = zinc finger.
Here, we focus on the most frequently utilized DNA-binding C2H2 ZFs found among higher eukaryotic TFs [24,25]. There are ~700 mammalian C2H2 ZFs, and many of them also contain at least one protein–protein interaction domain of the classes KRAB, SCAN, BTB/ POZ, or SET, typically near their N-termini [26-30]. However, our focus here is on the DNA-binding C2H2 ZF arrays themselves, several of which have been examined experimentally to verify their DNA-binding motifs [31-34]. A longtime goal has been accurately predicting the sequence preferences of C2H2-ZFs directly from their amino acid (aa) sequences, to suggest testable biological functions. However, the canonical “recognition-code” model for C2H2-ZFs (Figure 1b) is based largely on specificity residues within well-defined three-finger arrays and remains incomplete and error-prone [35-39]. Possible reasons for this inconsistency include the following: a) the roles of aa outside of the 3–4 specificity residues within a single C2H2-ZF unit, b) the impact of neighboring C2H2-ZF fingers that function as major- or minor-groove spanners, and c) DNA conformation-modulated binding. In addition, human C2H2 ZF arrays contain 3 to ~35 fingers [40], leading to predictable binding sites as large as ~30–100 base pairs. These fingers do not necessarily all engage in DNA binding simultaneously, further complicating prediction of genomic binding sites. In this review, we discuss examples of C2H2-ZFs that violate the canonical C2H2-ZF “recognition code.” The resulting principles should help to improve predictions.
ZF-position numbering used in this review
For C2H2-ZF proteins, the first structure reported for a three-finger protein complexed with DNA was Zif268/Egr1, more than three decades ago [41]. The DNA recognition process is sufficiently well understood to define a DNA recognition code [42,43], which, in turn, led to designed, sequence-specific ZF nucleases for genomic engineering [44-46]. This result implies a degree of modularity (independence between fingers and—within a finger—of the recognition aa), though such modularity is actually incomplete [39].
When bound to DNA, the helix of a typical ZF lies in the DNA major groove, whereas the antiparallel hairpin β strands and the C2–Zn–H2 unit lie on the outside, facing away from the DNA (Figure 1a). The N-terminal portion of each helix, and the first preceding residue prior to the helix, make major groove contacts with three adjacent DNA base pairs [42,43], which we term the “triplet element” (Figure 1b). Amino acids at positions −1, +3, and +6 (bottom of Figure 1c, in blue cylinder representing the α-helix) interact with the three bases of the recognition strand. In addition, the aa at position +2 interacts with the first base pair of the next triplet on the opposite strand (Figure 1b). This commonly used structure-based numbering scheme refers to the position immediately before the helix as −1, with positions +2, +3, and +6 within the helix. However, this numbering can lead to ambiguity (such as with the shorter helix in ZBTB7A [47]), so we use here the first zinc-coordinating His in each finger as reference position 0, with residues −1, −4, −5, and −7 corresponding to +6, +3, +2, and −1 of the structure-based numbering, respectively, (compare top and bottom in Figure 1c).
There are almost always 12 residues between the last zinc-coordinating Cys and the first zinc-coordinating His. This is much longer than the space between the two Cys ligands (2–4 residues) and between the two His ligands (3–4 residues). In addition, zinc coordination by three Cys and one His (C3H1) occurs in ZBTB7A ZF4 (CX2C-12 residues-HX5C) [47], and CCCTC-binding factor (CTCF) ZF11 (CX2C-12 residues-HX3C) [48-50]. We note that the CCHC-type ZF in the format of CX2C-4 residues-HX4C is common in RNA biology [51]. An unusual exception to the typical zinc coordination is Arabidopsis thaliana REF6, a DNA-sequence-specific histone lysine demethylase featuring four ZFs [52]. ZF1 lacks the final His ligand, which is provided by a His at position −6 of ZF2 (Figure 1d). This arrangement results in a compact structure of ZF1 and ZF2, but prevents ZF1–ZF2 of REF6 from binding DNA directly.
C2H2-ZF DNA recognition code
The three aa of each finger at positions −1, −4 and −7 (relative to the first Zn-coordinating His) or +6, +3 and −1 (relative to the α-helix) are the principal determinants of DNA base recognition. These aa recognize (respectively) the 5′, central, and 3′ positions of each triplet, primarily on one DNA strand (the “recognition strand”). As represented in Figure 1c, the protein sequence runs left to right from N to C termini, whereas the DNA sequence of the recognition strand runs “antiparallel”, right to left from 5′ to 3′ . By analyzing a set of 156 ZFs from a family of C2H2 proteins, we note a range of masses and pI values (Figure 1e), but most strikingly, the −7/−4/−1 triplets of recognition aa are very strongly biased (Figure 1f). Specifically, a third of the set of 156 has the pattern BHX (where B = basic = Lys or Arg, H = His, and X = any residue), and another fifth of them has the pattern RER (Arg-Glu-Arg) (Figure 1g). While some of this may reflect repetitive ZF proteins such as KRAB-ZF, which apparently underwent duplication to recognize repetitive DNA sequences [29,31,53,54], there is substantial concentration (in this known family) of the triplets in the available sequence space. For example, Arg and Lys are preferred at −1 and −7, His and Glu at −4, and Ala at −1 (Figure 1f-g).
Bulky and charged/polar residues at base-interacting positions confer specificity for guanine (commonly by Arg, Lys, or His), adenine (by Asn or Gln), or cytosine (by Asp or Glu). These base-specific interactions are established for many protein–DNA interactions (including C2H2 ZFs [reviewed in Refs. [10,40,55-57]]. Thymine and 5-methylcytosine both contain a methyl group at pyrimidine ring carbon 5 and are recognized via either C–H…O-type interactions with Glu [40,56,58] or methyl-specific van der Waals contacts as illustrated by Zfp568 [59] and SALL4 [60,61] (discussed in the following).
When the −1, −4, and/or −7 base-interacting positions are occupied by small (Gly, Ala, Ser, Thr) or nonaromatic hydrophobic residues, the corresponding DNA sequence usually is a variation of the consensus sequence. The variable bases also form (water-mediated) hydrogen bonds (H-bonds) and van der Waals contacts with these aa. These contacts are “versatile,” in the sense they can not only recognize more than one base at a given position but also exclude one or more. This implies that the participating aa can suit the varied DNA substrates and intimately fit the ZF array to a variety of sequences. This adaptability to sequence differences applies to ZF arrays such as human PRDM9 at recombination hot spots [62], and CTCF at chromatin loops [48,49]. Here, we focus on recent examples of DNA-base-specific interactions engaging ZF residues (Figure 1h), at ‘nonclassic’ positions (particularly at positions −5 and −8) that are not part of the original canonical model, even in a large survey of the three-finger DNA-binding landscape [38] and in a recent deep-learning-based prediction [39].
Arg – Asp switch at positions −8 and −7
Both Zfp57 (important for genomic imprinting) and PRDM9 (meiotic recombination) contain an N-terminal KRAB domain and, in the case of PRDM9, a SET domain [63,64]. An ArgeAsp (RD) dipeptide immediately precedes the ZF helix, at −8 and −7, in two neighboring Zfp57 fingers, as well as in two fingers of PRDM9 allele-A and three of allele-C (Figure 2a) [62,65,66]. Using PRDM9 allele-C as an example, with guanine in the recognition strand, the RD Arg forms bidentate H-bonds, whereas the Asp hydrogen-bonds with the Arg (Figure 2b). However, if the G:C base pair is inverted to C:G; these same RD residues adopt different conformations and partners. Specifically, the Asp now H-bonds with cytosine in the recognition strand, and, as a result, the adjacent Arg instead interacts with a backbone phosphate (Figure 2c-d). Thus, the RD dipeptide accommodates both C:G (G–Arg interaction) and G:C (C–Asp interaction). The same adaptability could apply to fingers that contain ArgeGlu (RE), Arg–Asn (RN), or Arg–Gln (RQ) at −8 and −7, allowing recognition of C:G (G–Arg) and G:5mC (5mC-Glu), or T:A (A–Asn) and T:A (A-Gln).
Figure 2. An Arg–Asp (RD) switch at positions −8 and −7.

The DNA recognition strand bases are in green, with the complementary strand in gray. (a) Examples of ZFs containing RD at positions −8 and −7. (b) ZF9 of PRDM9 allele-C has R at −8 interacting with Gua. (c–d) ZF10 and ZF12 of PRDM9 allele-C have D at −7 interacting with Cyt. (e) The ZF3 of TFIIIA spans four base pairs with the Arg at position +3 between the two His ligands (HxxRxH) interacting with guanine. (f) The (modeled) corresponding Arg at position 3 of ZF10 in CTCF could make DNA contacts with two neighboring phosphate groups. (g) The (modeled) corresponding Arg (magenta) at position 3 of ZF4 in ZNF524 could make DNA base contacts. Abbreviations: CTCF = CCCTC-binding factor; ZF = zinc finger.
Arg between two zinc-liganded His residues
TFIIIA of Xenopus laevis was one of the first C2H2 ZF proteins identified [14]. The crystal structure of the first six fingers of TFIIIA, bound with 31 bp of the 5S rRNA gene promoter, revealed that ZF1–ZF3 wrap around the major groove of DNA in the classic manner, whereas ZF4–ZF6 run along one side of the DNA, forming an open, extended structure [67] (we discuss ZF4–6 below). In ZF1–3, the space between the two Cys ligands is four residues (CX4C) and the space between the two His ligands is HX3-4H—either three residues (ZF1 and ZF2) or four (ZF3). ZF3 has a Lys, Asn, and Thr at base-interacting positions −1, −4, and −7 (Figure 2e). As expected, Lys at −1 interacts with a guanine and Asn at −4 with an adenine, but Thr (a small residue) at −7 is too far away to reach the corresponding base of the triplet element (Figure 2f). Interestingly, an Arg between the two His Zn ligands (Hx2RXH) interacts with the guanine 5′ to the triplet, which effectively shifts the recognized triplet by one bp, or expands the coverage of the ZF module to four bp (Figure 2f). This is a rare instance in which an Arg between two zinc-liganded His residues interacts base specifically. In most cases of HX3H we examined, positively charged residues between the two His ligands interact with DNA phosphates. However, as with the HxxRXH motif in ZF3 of TFIIIA, ZF10 of CTCF [50] and ZF4 of ZNF524 [68] have Arg at the analogous position, yet in the two latter instances, the Arg is disordered. We constructed a model of the Arg side-chain, which led to the suggestion that in ZF10 of CTCF, this Arg likely makes contact with the DNA backbone (Figure 2g), whereas in ZF4 of ZNF524, it appears to engage in a base-specific contact (Figure 2h).
We searched a database of all human C2H2 ZF proteins, at smart.embl.de/smart/do_annotation.pl?DOMAIN=SM00355, for the subset containing HXXRXH motifs downstream of an appropriate CX2-4C pair, with a 12-residue spacer in between, and found 105 HxxRXH motifs in 90 proteins. Interestingly, one of these proteins (ZNF142) has four of these HxxRxH ZFs. Mutations of ZNF412 have been linked to neurodevelopment disorders [69-74]. In one study, among the 27 different ZNF142 variants identified from 35 individuals, only four were missense [70] (S763C, C1233F, F1295L and R1500W of UniProt P52746). Among these, three residues are within a ZF unit: C1233F, involved in zinc coordination; F1295L, part of the hydrophobic core; and R1500W, involved in DNA backbone phosphate interactions.
Small side-chain at position −5
In the canonical model of C2H2 ZFs [42,43], the small aa at the −5 position makes cross-triplet and cross-strand interactions with the first base pair of the following triplet (Figure 1b). Again, we use PRDM9 allele-C as an example (Figure 3a). Despite complete conservation of Ser at position −5 in each ZF in this case, the serines interact, sometimes via water-mediated contacts, with all four possible bases (Figure 3b). This adaptability stems in part from the ability of Ser (or other small aa) to act as an H-bond donor or acceptor, or both at the same time, or to provide van der Waals contacts. The cross-strand contact mediated by the small aa at −5 [corresponding to position 2 in structure-based numbering (Figure 1b, bottom)] is thus generally not a determinant of DNA-binding specificity.
Figure 3. Varied residues at ZF position −5.

(a) Sequence alignment of six C2H2 fingers of PRDM9 allele-C with invariant Ser at position −5. (b) The completely conserved Ser at position −5 in each ZF of PRDM9 interacts with DNA that differs from base pair to base pair. The DNA recognition strand is in green, and the complementary strand in gray. (c) A CpG dinucleotide is recognized jointly by ZF3 and ZF4 of CTCF. (d) CTCF uses Glu at position −7 of ZF4 to recognize 5-methylcytosine (5mC), and Ser at −5 to contact 5mC on the opposite strand. (e) The same CTCF ZF4 can also bind unmodified CpG. (f) Methyl-specific interaction with an A:T-rich sequence by Zfp568. (g–h) Zfp568 ZF5 and ZF6 interaction with five thymine bases (with methyl groups as yellow balls). (i) Methyl-specific interactions with A:T-rich sequence by SALL4. Patient missense mutations associated with Okihiro syndrome are indicated below the sequence. (j–k) SALL4 ZF6 and ZF7 interaction with five thymine bases (with methyl groups in yellow balls). (l) Examples of ZFs containing an Arg–Asp (R-D) pair at positions −7 and −5. (m) In Egr1/Zif268, the Asp at −5 of R-D pair interacts with the cross-strand cytosine via water-mediated interactions. (n) In WT1, the R-D Asp at −5 H-bonds with the cross-strand and cross-triplet adenine. (o) In Klf4, the R-D Asp H-bonds with the cross-strand and cross-triplet cytosine. (p) Examples of ZFs containing Trp or Tyr at position −5. (q) Two orthogonal views of superimposition of five fingers reveal two alternative conformations of Trp/Tyr at −5. (r) In TFIIIA, Trp at −5 points in the same direction as Lys at −4. (s) In ZBTB7A, Tyr at −5 points in the same direction as Asp at −4 and His at −7. (t) In HIC2, Tyr at −5 points in the same direction as Arg at −7. (u) In CTCF ZF5, Tyr at −5 points in the opposite direction to Lys at −4 and Arg at −1. Abbreviations: CTCF = CCCTC-binding factor; ZF = zinc finger.
Considering that both 5-methylcytosine (5mC) and thymine contain a methyl group at pyrimidine ring carbon 5, the small side-chain at position −5 could provide enhanced contact with 5mC at the CpG methylation site. Example includes ZF4 of CTCF (Figure 3c). ZF4 can bind five versions: CpG/CpG, 5mCpG/5mCpG, CpG/5mCpG, 5mCpG/CpG, and CpA/TpG. This rather catholic binding occurs via interactions with Glu at −7 position for the recognition strand (cytosine or 5mC), whereas Ser at −5 for the cross-strand interaction is promiscuous, accepting a thymine, 5mC or cytosine (Figure 3d-e).
In Zfp568, two fingers together (ZF5 and ZF6) recognize thymines of the opposite strand for a stretch of five A:T bases pairs (Figure 3f). Instead of the adenines of the recognition strand, the five thymines of the opposite strand are contacted by four small side-chains (Ser and Cys) and one hydrophobic residue (Leu) via van der Waals interactions with thymine methyl groups (Figure 3g-h). Such interactions, over a stretch of five bases of the nonrecognition strand, have not previously been described in classical ZF–DNA complexes.
A notable yet unexpected finding emerged from the structural analysis of SALL4 [60,61], mutations of which are linked to Okihiro syndrome. SALL4 is composed of seven ZFs, organized into three clusters. Particularly, the C-terminal cluster, comprising ZF6 and ZF7, exhibits a unique composition: only one conventional polar residue, Asn at −4 of ZF7, and the other positions typically involved in base interactions instead contain small (Gly, Ser, Thr) or hydrophobic residues (Ile, Val) (Figure 3i). Together, Ser at −8 and Ile at −1 in ZF6, and Thr at −7, Gly at −5, and Val at −1 in ZF7, make van der Waals contacts with five thymine methyl groups in an A:T-rich DNA sequence (Figure 3j-k). Three patient mutations linked to Okihiro syndrome are H888R of ZF6, involved in zinc coordination, R890W of ZF6, involved in DNA backbone phosphate interactions, and G911D of ZF7, involved in cross-strand binding of a thymine (Figure 3i).
Arg – Asp pair at positions −7 and −5
The three-finger-binding domain of Egr1/Zif268 is one of the first, and best structurally-studied, C2H2 ZF proteins [41,75]. Each finger has an Asp at −5, which is almost always associated with an Arg at −7 (additional examples include Wilms tumor protein WT1 [76] and Krüppel-like factor Klf4 [77]) (Figure 3l). The Arg at −7 H-bonds with a guanine. Each of these Argeguanine contacts is stabilized by Asp at −5 – the Asp side-chain carboxylate is bent over, and both oxygens form an H-bond salt bridge with the Arg guanidinium (Figure 3m). This Asp?–Arg interaction may help position and stabilize the long Arg side-chain, strengthening the Arg–guanine contacts [78]. In addition, the Asp at −5 has one of its carboxylate oxygens within hydrogenbonding (or water-mediated) distance of a cross-strand and/or cross-triplet interaction with the first base pair of the following triplet (Figure 3m-o). However, as with Ser at −5, the cross-stand and cross-triplet hydrogen bond by the Asp at −5 does not in this case mediate base recognition (apparently H-bonding with the amino group at either Cyt-N4 or Ade-N6).
Aromatic residue at position −5
There are several examples of an aromatic residue at −5, including Trp of TFIIIA ZF1 [67], and Tyr in several cases: ZBTB7A ZF3 [47,79], HIC2 ZF4 [80], and CTCF ZFs 5 and 8 [50] (Figure 3p). We superimposed the five Trp/Tyr-containing fingers, and the aromatic rings take two alternative conformations: either pointing in the same direction as base-interacting residues or pointing away from the DNA (Figure 3q). In TFIIIA, ZBTB7A, and HIC2, the aromatic residues point in the same general direction as the base-interacting positions (−1, −4, and −7). In TFIIIA, Trp at −5 spans two base pairs and the bulky side-chain displaces the associated ZF 4Å away from the DNA, as well as from the base-interacting Lys residues at −4 and −7 (Figure 3r). The long Lys side-chain at −7, displaced from the recognition strand, can still reach the DNA base on the opposite strand. Similarly, ZBTB7A has Tyr at −5, with Asn, Asp, and His at base-interacting positions (−1, −4, and −7). Each of these residues could in theory make base-specific interactions but do not, with Tyr forming a T-shaped stacking geometry with cytosine in the major groove, whereas the corresponding residues at base-interacting positions have side-chains too short to reach the corresponding DNA base (Figure 3s). As with ZBTB7A, the −5 Tyr in ZF4 of HIC2 points toward the DNA, together with the largest side-chain of Arg at −7 and contacts the same G:C base pair, forming a four-way interaction (Figure 3t).
CTCF has two fingers, ZF5 and ZF8, with Tyr at −5 (Figure 3p). Each of them takes a different conformation (Figure 3q) (we discuss CTCF ZF8 in the following). With Tyr of ZF5 pointing away from the side-chains at base-interacting positions (Figure 3u), Arg at position −1 and Lys at −4 (the two longest side-chains) can reach the guanine bases, whereas the shorter Asp at −7 allows a variable base without a direct base contact.
Among the four examples compared here (excepting CTCF ZF8), there are three consequences of having a bulky and aromatic residue at position −5: a) pointing away from the base-interacting residues and allowing them to contact the DNA (CTCF ZF5); b) pointing along with the base-interacting residues and preventing shorter side-chains at base-interacting positions from reaching the DNA (ZBTB7A ZF3); and c) pointing along but still allowing the longest side-chains (Arg and Lys) to interact with a DNA base (HIC2 and TFIIIA).
Large and charged/polar residue at position −5: cross-strand base-specific interactions
Continuing the theme of having a charged or polar residue at position −5: CTCF ZF9 and ZF10 have Arg or Gln, respectively (Figure 4a), whereas ZNF410 ZF2 has a Gln [81], ZNF524 ZF4 has an Asn [68], and ZBTB10 ZF1 and ZF2 have Arg or Glu, respectively [82]. Arg at −5 in CTCF ZF9 makes cross-strand interactions with guanine (Figure 4b). The large Arg side-chain pushes the α-helix of ZF9 away from the DNA interface, weakening the other interactions (increasing the distances) between DNA and His at −4 and Met at −1. Similarly, Gln at −5 of ZF10 makes a cross-strand interaction with adenine, weakening the interactions of Gln at −7, Leu at −4, and Met at −1 by increasing distances to their corresponding DNA bases (Figure 4c).
Figure 4. Large and charged/polar residue at position −5.

(a) CTCF has an Arg at −5 of ZF9 and a Gln at −5 of ZF10. (b) A cross-strand guanine-specific interaction mediated by Arg at −5 of CTCF ZF9 increases the spacing distances between residues at −7, −4, and −1 and DNA. (c) A cross-strand adenine specific interaction mediated by Gln at −5 of CTCF ZF10 increases the spacing distances between residues at −7, −4 and −1 and DNA. (d) Gln at −5 of ZF2 in ZNF410 makes a cross-strand adenine-specific interaction. (e) Asn at −5 of ZF4 in ZNF524 makes cross-strand interactions with two adjacent TA bases. (f) Arg at −5 of ZF1 and Glu at −5 of ZF2 in ZBTB10 both make cross-strand interactions. (g) Lys at −4 of ZF1 in ZBTB7A makes a cross-strand guanine-specific interaction. (h) ZF8 of CTCF spans the minor groove. (i) ZF4 and ZF6 of TFIIIA are positioned across the minor groove. (j) ZF2 of Zfp568 spans the minor groove, while ZF1 is involved in inter-finger interactions. (k) ZF4 of ZBTB24 crosses the major groove without making base-specific contacts. (l) ZF2 of HIC2 traverses the minor groove. In all structures, the DNA recognition strand is in green and the complementary in gray. (m) Sequence alignment of six spacer ZFs including the pre- and post-linker regions. TFIIIA ZF6 has shorter linkers but expanded distances between the two Cys ligands and two His ligands of zinc ion. Abbreviations: CTCF = CCCTC-binding factor; ZF = zinc finger.
Other notable examples of cross-strand interactions involving residues at position −5 include: (i) a Gln at −5 of ZF2 in ZNF410, that forms a cross-strand interaction with an adenine base (Figure 4d), reminiscent of Gln in ZF10 of CTCF; (ii) an Asn in ZF4 of ZNF524 engages in cross-strand interactions with two adjacent bases in a TpA sequence (Figure 4e); and (iii) an Arg in ZF1 and a Glu in ZF2 of ZBTB10, both participate in cross-strand interactions (Figure 4f).
In sum, at least for the limited number of examples discussed here, larger side-chains (Arg, Gln/Asn, and Glu) at −5 make cross-strand base-specific interactions with guanine, adenine, and cytosine and provide base specificity for the corresponding C:G, T:A, and G:C base pairs, respectively. This situation is different from the “versatile” contacts made by smaller side-chains (e.g. Ser) at position −5, or an R-D pair at −7 and −5, as discussed earlier. Our analysis revealed that highly specific Arg–Gua and Gln/Asn–Ade interactions, recognizing G:C or A:T base pairs from canonical positions −1, −4, and −7, also apply to position −5 but in a cross-strand fashion. Base-specific contacts by larger and charged/polar residues at position −5 might compensate for cases, where small or hydrophobic residues at the cognate positions −1, −4, and −7 cannot provide specificity (e.g. Met at −1 and Leu at −4 of ZF10 in CTCF, and Val at both −1 and −7 of ZF2 in ZNF410).
Additionally, a cross-strand interaction between Lys and guanine has been observed in various instances, such as Lys413 of ZF1 in KLF4 [77], Lys328 of ZF4 in ZNF410 [81], and Lys396 of ZF1 in ZBTB7A [47,79] (Figure 4g). These Lys residues, unlike Arg or Gln at position −5, are situated at the conventional base-interacting locations of −7 (as in KLF4), −4 (as in ZBTB7A), or −1 (as in ZNF410). For instance, the ability of ZBTB7A Lys396 to switch from interacting with G1 in a G1ACCC sequence to G2 on the opposite strand in a GC2CCC sequence enables ZBTB7A to bind to sequences with variations at the second position in the G(a/c)CCC pattern [79].
ZF as a spacer
In CTCF, instead of continuing in the DNA major groove, ZF8 spans the minor groove and acts as a spacer to properly position the C-terminal fingers ZF9–ZF11 in the major groove (Figure 4h) [48-50]. As ZF8 of CTCF, similar spacers have been seen in TFIIIA [67], in which ZF4 and ZF6 are positioned and across the minor groove to span the entire length of the duplex (Figure 4i); whereas in Zfp568 [59], ZF2 spans the DNA minor groove at an A:T-rich stretch (where the minor groove is narrower [59]) (Figure 4j); in ZBTB24 [83], in which ZF4 spans the DNA major groove, and in HIC2, where ZF2 spans the DNA minor groove (Figure 4k-l). There is no clear sequence similarity among the six examples of spacer ZFs (Figure 4m). CTCF ZF8 and its associated prelinker and postlinker regions harbor the largest number of positively charged residues (nine), which could interact with negatively charged DNA-backbone phosphates, followed by HIC2 (seven) and ZBTB24 (six). In addition, large and charged side-chains occupy the corresponding positions −6 and −5 in CTCF ZF8 (RY), ZBTB24 ZF4 (KR), and TFIIIA ZF4 (HN) (Figure 4m). HIC2 ZF2 and TFIIIA ZF6 contain a Pro (in the beginning of the helix) or Trp at position −6, and both are infrequent among the ZFs we examined (see Figure 1). We used the HMMER algorithm (http://zf.princeton.edu/index.php) to calculate the bit score for each finger, and all of them (except TFIIIA ZF6) have fairly high confidence scores, the lowest being 18 and the highest being 32. ZF6 of TFIIIA has shorter linkers (4–5 instead of 7–8 residues) but expanded distances between the two Cys ligands (5 instead of 2–4 residues) and two His ligands (4 instead of 3 residues) (Figure 4m). Nevertheless, it remains to be determined exactly what features would allow prediction of which ZFs could function as spacers.
Summary and perspective
Seven large and charged or polar residues (Arg, Lys, His, Gln, Asn, Glu, and Asp) are commonly involved in DNA base-specific interactions within the major groove [84]. These interactions are particularly relevant when these residues are positioned at five specific locations in C2H2-ZF proteins: −1, −4, −5, −7, or −8 (Figure 1c). Their presence at these sites confers sequence specificity for both strands of the DNA. Additionally, small and nonaromatic hydrophobic residues at these positions often provide “versatile” contacts, which can enhance binding affinity, and in some instances, these residues make van der Waals interactions with the methyl group of thymine within A:T-rich sequences. Further research is required to fully understand the determinants that guide C2H2-ZFs to function as major- or minor-groove spanners or spacers, to induce DNA conformation changes upon binding, and to fully characterize the role of residues outside of these five specificity-conferring residues within a single C2H2-ZF unit (e.g. the Arg within HxxRXH of ZF3 in TFIIIA).
In a significant breakthrough at the end of 2023, the food and drug administration (FDA) granted approval to the first clustered regularly interspaced short palindromic repeats (CRISPR) gene-editing treatment for sickle cell disease (FDA announcement: FDA Approves First Gene Therapies to Treat Patients with Sickle Cell Disease). This innovative treatment targets the BCL11A gene, which codes for a C2H2 ZF protein. BCL11A binds directly to the promoter of the fetal hemoglobin gene [85], playing a key role in the suppression of its production soon after birth. The CRISPR-mediated disruption of BCL11A activity facilitates the reactivation of fetal hemoglobin production, offering a promising new strategy for the management of sickle cell disease. This development underscores the vast, yet largely untapped potential of the approximately 700 mammalian C2H2 ZF proteins. Their diverse functions and roles in various biological processes offer numerous possibilities for future research and therapeutic interventions.
Acknowledgements
We thank current members of the Cheng laboratory for discussion. We thank Dr. Yiwei Liu for his work on Zfp57 and Klf4; Dr. Anamika Patel for her work on PRDM9 and ZFP568; Dr. Hideharu Hashimoto for his work on WT1 and Egr1; Dr. Ren (Emily) Ren for her work on ZBTB24, ZNF410, ZBTB7A, and HIC2; Dr. Hideharu Hashimoto, Dr. Jie Yang, and Dr. John Horton for their work on CTCF. The work in the Cheng laboratory was supported in part by the National Institutes of Health (grant R35GM134744) and Cancer Prevention and Research Institute of Texas (RR160029). X.C. is a CPRIT scholar in Cancer Research.
Footnotes
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Data availability
No new data were generated for the research described in the article.
References
Papers of particular interest, published within the period of review, have been highlighted as:
* of special interest
** of outstanding interest
- 1.Latchman DS: Transcription factors: an overview. Int J Exp Pathol 1993, 74:417–422. [PMC free article] [PubMed] [Google Scholar]
- 2.Wolberger C: How structural biology transformed studies of transcription regulation. J Biol Chem 2021, 296, 100741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Boumpas P, Merabet S, Carnesecchi J: Integrating transcription and splicing into cell fate: transcription factors on the block. Wiley Interdiscip Rev RNA 2023, 14, e1752. [DOI] [PubMed] [Google Scholar]
- 4.Hecker M, Wagner AH: Transcription factor decoy technology: a therapeutic update. Biochem Pharmacol 2017, 144:29–34. [DOI] [PubMed] [Google Scholar]
- 5.Radaeva M, Ton AT, Hsing M, Ban F, Cherkasov A: Drugging the ‘undruggable’. Therapeutic targeting of protein-DNA interactions with the use of computer-aided drug discovery methods. Drug Discov Today 2021, 26:2660–2679. [DOI] [PubMed] [Google Scholar]
- 6.Kubik S, Bruzzone MJ, Jacquet P, Falcone JL, Rougemont J, Shore D: Nucleosome stability distinguishes two different promoter types at all protein-coding genes in yeast. Mol Cell 2015, 60:422–434. [DOI] [PubMed] [Google Scholar]
- 7.Morgunova E, Taipale J: Structural insights into the interaction between transcription factors and the nucleosome. Curr Opin Struct Biol 2021, 71:171–179. [DOI] [PubMed] [Google Scholar]
- 8.Horberg J, Reymer A: Specifically bound BZIP transcription factors modulate DNA supercoiling transitions. Sci Rep 2020, 10, 18795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Horberg J, Moreau K, Tamas MJ, Reymer A: Sequence-specific dynamics of DNA response elements and their flanking sites regulate the recognition by AP-1 transcription factors. Nucleic Acids Res 2021, 49:9280–9293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yang J, Zhang X, Blumenthal RM, Cheng X: Detection of DNA modifications by sequence-specific transcription factors. J Mol Biol 2020, 432:1661–1673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Rausch C, Hastert FD, Cardoso MC: DNA modification readers and writers and their interplay. J Mol Biol 2020, 432:1731–1746. [DOI] [PubMed] [Google Scholar]
- 12.Gray PA, Fu H, Luo P, Zhao Q, Yu J, Ferrari A, Tenzen T, Yuk DI, Tsung EF, Cai Z, et al. : Mouse brain organization revealed through direct genome-scale TF expression analysis. Science 2004, 306:2255–2257. [DOI] [PubMed] [Google Scholar]
- 13.Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM: A census of human transcription factors: function, expression and evolution. Nat Rev Genet 2009, 10:252–263. [DOI] [PubMed] [Google Scholar]
- 14.Miller J, McLachlan AD, Klug A: Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J 1985, 4:1609–1614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Berg JM: Proposed structure for the zinc-binding domains from transcription factor IIIA and related proteins. Proc Natl Acad Sci U SA 1988, 85:99–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Klug A: The discovery of zinc fingers and their applications in gene regulation and genome manipulation. Annu Rev Biochem 2010, 79:213–231. [DOI] [PubMed] [Google Scholar]
- 17.Neuhaus D: Zinc finger structure determination by NMR: why zinc fingers can be a handful. Prog Nucl Magn Reson Spectrosc 2022, 130–131:62–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Cassandri M, Smirnov A, Novelli F, Pitolli C, Agostini M, Malewicz M, Melino G, Raschella G: Zinc-finger proteins in health and disease. Cell Death Discov 2017, 3, 17071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Font J, Mackay JP: Beyond DNA: zinc finger domains as RNA-binding modules. Methods Mol Biol 2010, 649:479–491. [DOI] [PubMed] [Google Scholar]
- 20.Fu M, Blackshear PJ: RNA-binding proteins in immune regulation: a focus on CCCH zinc finger proteins. Nat Rev Immunol 2017, 17:130–143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Sousa SF, Lopes AB, Fernandes PA, Ramos MJ: The Zinc proteome: a tale of stability and functionality. Dalton Trans 2009:7946–7956. [DOI] [PubMed] [Google Scholar]
- 22.Maret W: New perspectives of zinc coordination environments in proteins. J Inorg Biochem 2012, 111:110–116. [DOI] [PubMed] [Google Scholar]
- 23.Trzaskowski B, Adamowicz L, Deymier PA: A theoretical study of zinc(II) interactions with amino acid models and peptide fragments. J Biol Inorg Chem 2008, 13:133–137. [DOI] [PubMed] [Google Scholar]
- 24.Iuchi S: Three classes of C2H2 zinc finger proteins. Cell Mol Life Sci 2001, 58:625–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG: C2H2 zinc finger proteins: the largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae 2017, 9:47–58. [PMC free article] [PubMed] [Google Scholar]
- 26.Collins T, Stone JR, Williams AJ: All in the family: the BTB/POZ, KRAB, and SCAN domains. Mol Cell Biol 2001, 21:3609–3615. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Schmitges FW, Radovani E, Najafabadi HS, Barazandeh M, Campitelli LF, Yin Y, Jolma A, Zhong G, Guo H, Kanagalingam T, et al. : Multiparameter functional diversity of human C2H2 zinc finger proteins. Genome Res 2016, 26:1742–1752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Sun Y, Keown JR, Black MM, Raclot C, Demarais N, Trono D, Turelli P, Goldstone DC: A dissection of oligomerization by the TRIM28 tripartite motif and the interaction with members of the Krab-ZFP family. J Mol Biol 2019, 431:2511–2527. [DOI] [PubMed] [Google Scholar]
- 29.Wolf G, de Iaco A, Sun MA, Bruno M, Tinkham M, Hoang D, Mitra A, Ralls S, Trono D, Macfarlan TS: KRAB-zinc finger protein gene expansion in response to active retrotransposons in the murine lineage. Elife 2020, 9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Di Tullio F, Schwarz M, Zorgati H, Mzoughi S, Guccione E: The duality of PRDM proteins: epigenetic and structural perspectives. FEBS J 2022, 289:1256–1275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Thomas JH, Emerson RO: Evolution of C2H2-zinc finger genes revisited. BMC Evol Biol 2009, 9:51. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Najafabadi HS, Mnaimneh S, Schmitges FW, Garton M, Lam KN, Yang A, Albu M, Weirauch MT, Radovani E, Kim PM, et al. : C2H2 zinc finger proteins greatly expand the human regulatory lexicon. Nat Biotechnol 2015, 33:555–562. [DOI] [PubMed] [Google Scholar]
- 33.Imbeault M, Helleboid PY, Trono D: KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 2017, 543:550–554. [DOI] [PubMed] [Google Scholar]
- 34.Lambert SA, Jolma A, Campitelli LF, Das PK, Yin Y, Albu M, Chen X, Taipale J, Hughes TR, Weirauch MT: The human transcription factors. Cell 2018, 172:650–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wolfe SA, Grant RA, Elrod-Erickson M, Pabo CO: Beyond the “recognition code”: structures of two Cys2His2 zinc finger/TATA box complexes. Structure 2001, 9:717–723. [DOI] [PubMed] [Google Scholar]
- 36.Persikov AV, Singh M: De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res 2014, 42:97–108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Gupta A, Christensen RG, Bell HA, Goodwin M, Patel RY, Pandey M, Enuameh MS, Rayla AL, Zhu C, Thibodeau-Beganny S, et al. : An improved predictive recognition model for Cys2-His2 zinc finger proteins. Nucleic Acids Res 2014, 42:4800–4812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Persikov AV, Wetzel JL, Rowland EF, Oakes BL, Xu DJ, Singh M, Noyes MB: A systematic survey of the Cys2His2 zinc finger DNA-binding landscape. Nucleic Acids Res 2015, 43:1965–1984. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Aizenshtein-Gazit S, Orenstein Y: DeepZF: improved DNA-binding prediction of C2H2-zinc-finger proteins by deep transfer learning. Bioinformatics 2022, 38:ii62–ii67. [DOI] [PubMed] [Google Scholar]
- 40.Liu Y, Zhang X, Blumenthal RM, Cheng X: A common mode of recognition for methylated CpG. Trends Biochem Sci 2013, 38:177–183. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Pavletich NP, Pabo CO: Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science 1991, 252:809–817. [DOI] [PubMed] [Google Scholar]
- 42.Choo Y, Klug A: Physical basis of a protein-DNA recognition code. Curr Opin Struct Biol 1997, 7:117–125. [DOI] [PubMed] [Google Scholar]
- 43.Wolfe SA, Nekludova L, Pabo CO: DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct 2000, 29:183–212. [DOI] [PubMed] [Google Scholar]
- 44.Chandrasegaran S, Carroll D: Origins of programmable nucleases for genome engineering. J Mol Biol 2016, 428:963–989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bogdanove AJ, Bohm A, Miller JC, Morgan RD, Stoddard BL: Engineering altered protein-DNA recognition specificity. Nucleic Acids Res 2018, 46:4845–4871. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Paschon DE, Lussier S, Wangzor T, Xia DF, Li PW, Hinkley SJ, Scarlott NA, Lam SC, Waite AJ, Truong LN, et al. : Diversifying the structure of zinc finger nucleases for high-precision genome editing. Nat Commun 2019, 10:1133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. *. Yang Y, Ren R, Ly LC, Horton JR, Li F, Quinlan KGR, Crossley M, Shi Y, Cheng X: Structural basis for human ZBTB7A action at the fetal globin promoter. Cell Rep 2021, 36, 109759. The study reported that (i) a Pro at position −2 of ZF1 resulted in a shorter helix, (ii) an aromatic Tyr at position −5 of ZF3 prevented its specific contact with DNA, and (iii) ZF4 has an atypical CCHC (C3H1) zinc coordination.
- 48.Hashimoto H, Wang D, Horton JR, Zhang X, Corces VG, Cheng X: Structural basis for the versatile and methylation-dependent binding of CTCF to DNA. Mol Cell 2017, 66:711–720 e713. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yin M, Wang J, Wang M, Li X, Zhang M, Wu Q, Wang Y: Molecular mechanism of directional CTCF recognition of a diverse range of genomic sites. Cell Res 2017, 27:1365–1377. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. **. Yang J, Horton JR, Liu B, Corces VG, Blumenthal RM, Zhang X, Cheng X: Structures of CTCF-DNA complexes including all 11 zinc fingers. Nucleic Acids Res 2023, 51:8447–8462. The study revealed that (i) ZF8 is a space, (ii) ZF9 and ZF10 have cross-strand interactions using Arg or Gln at position −5, and (iii) Tye residues have an aromatic Tyr at position −5 of ZF5 and ZF8 has difference functions.
- 51.Aceituno-Valenzuela U, Micol-Ponce R, Ponce MR: Genome-wide analysis of CCHC-type zinc finger (ZCCHC) proteins in yeast, Arabidopsis, and humans. Cell Mol Life Sci 2020, 77:3991–4014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52. *. Tian Z, Li X, Li M, Wu W, Zhang M, Tang C, Li Z, Liu Y, Chen Z, Yang M, et al. : Crystal structures of REF6 and its complex with DNA reveal diverse recognition mechanisms. Cell Discov 2020, 6:17. The study reported a rare cross-finger zinc coordination in Arabidopsis thaliana REF6
- 53.Jinlong W, Jian W, Chunyan T: Evolution of KRAB-containing zinc finger proteins and their roles in species evolution. Yi Chuan 2016, 38:971–978. [DOI] [PubMed] [Google Scholar]
- 54.Thomas JH, Schneider S: Coevolution of retroelements and tandem zinc finger genes. Genome Res 2011, 21:1800–1812. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Luscombe NM, Laskowski RA, Thornton JM: Amino acid-base interactions: a three-dimensional analysis of protein-DNA interactions at an atomic level. Nucleic Acids Res 2001, 29:2860–2874. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Buck-Koehntop BA, Defossez PA: On how mammalian transcription factors recognize methylated DNA. Epigenetics 2013, 8:131–137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Ren R, Horton JR, Zhang X, Blumenthal RM, Cheng X: Detecting and interpreting DNA methylation marks. Curr Opin Struct Biol 2018, 53:88–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Nikolova EN, Stanfield RL, Dyson HJ, Wright PE: CH. O hydrogen bonds mediate highly specific recognition of methylated CpG sites by the zinc finger protein Kaiso. Biochemistry 2018, 57:2109–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Patel A, Yang P, Tinkham M, Pradhan M, Sun MA, Wang Y, Hoang D, Wolf G, Horton JR, Zhang X, et al. : DNA conformation induces adaptable binding by tandem zinc finger proteins. Cell 2018, 173:221–233 e212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. *. Ru W, Koga T, Wang X, Guo Q, Gearhart MD, Zhao S, Murphy M, Kawakami H, Corcoran D, Zhang J, et al. : Structural studies of SALL family protein zinc finger cluster domains in complex with DNA reveal preferential binding to an AATA tetranucleotide motif. J Biol Chem 2022, 298, 102607. These two studies reported van der Waals contacts via small or hydrophobic residues with the methyl groups of thymine in TA-rich sequence.
- 61. *. Watson JA, Pantier R, Jayachandran U, Chhatbar K, Alexander Howden B, Kruusvee V, Prendecki M, Bird A, Cook AG: Structure of SALL4 zinc finger domain reveals link between AT-rich DNA binding and Okihiro syndrome. Life Sci Alliance 2023, 6. These two studies reported van der Waals contacts via small or hydrophobic residues with the methyl groups of thymine in TA-rich sequence.
- 62.Patel A, Horton JR, Wilson GG, Zhang X, Cheng X: Structural basis for human PRDM9 action at recombination hot spots. Genes Dev 2016, 30:257–265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Li X, Ito M, Zhou F, Youngson N, Zuo X, Leder P, Ferguson-Smith AC: A maternal-zygotic effect gene, Zfp57, maintains both maternal and paternal imprints. Dev Cell 2008, 15:547–557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Baudat F, Buard J, Grey C, Fledel-Alon A, Ober C, Przeworski M, Coop G, de Massy B: PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice. Science 2010, 327:836–840. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Liu Y, Toh H, Sasaki H, Zhang X, Cheng X: An atomic model of Zfp57 recognition of CpG methylation within a specific DNA sequence. Genes Dev 2012, 26:2374–2379. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Patel A, Zhang X, Blumenthal RM, Cheng X: Structural basis of human PR/SET domain 9 (PRDM9) allele C-specific recognition of its cognate DNA sequence. J Biol Chem 2017, 292:15994–16002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Nolte RT, Conlin RM, Harrison SC, Brown RS: Differing roles for zinc fingers in DNA recognition: structure of a six-finger transcription factor IIIA complex. Proc Natl Acad Sci U S A 1998, 95:2938–2943. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. *. Xu Z, Chang F, Viceconte N, Rane G, Levin M, Lototska L, Roth F, Hillairet A, Fradera-Sola A, et al. : ZNF524 directly interacts with telomeric DNA and supports telomere integrity. Nat Commun 2023, 14:8252. The study revealed that an Asn at position −5 makes cross-strand interactions with two adjacent TA bases.
- 69.Khan K, Zech M, Morgan AT, Amor DJ, Skorvanek M, Khan TN, Hildebrand MS, Jackson VE, Scerri TS, Coleman M, et al. : Recessive variants in ZNF142 cause a complex neurodevelopmental disorder with intellectual disability, speech impairment, seizures, and dystonia. Genet Med 2019, 21:2532–2542. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Christensen MB, Levy AM, Mohammadi NA, Niceta M, Kaiyrzhanov R, Dentici ML, Al Alam C, Alesi V, Benoit V, Bhatia KP, et al. : Biallelic variants in ZNF142 lead to a syndromic neurodevelopmental disorder. Clin Genet 2022, 102:98–109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Kamal N, Khamirani HJ, Mohammadi S, Dastgheib SA, Dianatpour M, Tabei SMB: ZNF142 mutation causes neurodevelopmental disorder with speech impairment and seizures: novel variants and literature review. Eur J Med Genet 2022, 65, 104522. [DOI] [PubMed] [Google Scholar]
- 72.Kameyama S, Mizuguchi T, Fukuda H, Moey LH, Keng WT, Okamoto N, Tsuchida N, Uchiyama Y, Koshimizu E, Hamanaka K, et al. : Biallelic null variants in ZNF142 cause global developmental delay with familial epilepsy and dysmorphic features. J Hum Genet 2022, 67:169–173. [DOI] [PubMed] [Google Scholar]
- 73.Mir A, Song Y, Lee H, Montazer-Zohouri M, Reisi M, Tabatabaiefar MA: A deleterious frameshift insertion mutation in the ZNF142 gene leads to intellectual developmental disorder with impaired speech in three affected siblings: clinical features and literature review. Mol Genet Genomic Med 2023, 11, e2261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Proskorovski-Ohayon R, Eskin-Schwartz M, Shorer Z, Kadir R, Halperin D, Drabkin M, Yogev Y, Aharoni S, Hadar N, Cohen H, et al. : ZNF142 mutation causes sex-dependent neurologic disorder. J Med Genet 2024, 10.1136/jmg-2023-109447. [DOI] [PubMed] [Google Scholar]
- 75.Zandarashvili L, White MA, Esadze A, Iwahara J: Structural impact of complete CpG methylation within target DNA on specific complex formation of the inducible transcription factor Egr-1. FEBS Lett 2015, 589:1748–1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Hashimoto H, Olanrewaju YO, Zheng Y, Wilson GG, Zhang X, Cheng X: Wilms tumor protein recognizes 5-carboxylcytosine within a specific DNA sequence. Genes Dev 2014, 28:2304–2313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Liu Y, Olanrewaju YO, Zheng Y, Hashimoto H, Blumenthal RM, Zhang X, Cheng X: Structural basis for Klf4 recognition of methylated DNA. Nucleic Acids Res 2014, 42:4859–4867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Elrod-Erickson M, Rould MA, Nekludova L, Pabo CO: Zif268 protein-DNA complex refined at 1.6 A: a model system for understanding zinc finger-DNA interactions. Structure 1996, 4:1171–1180. [DOI] [PubMed] [Google Scholar]
- 79. *. Ren R, Horton JR, Chen Q, Yang J, Liu B, Huang Y, Blumenthal RM, Zhang X, Cheng X: Structural basis for transcription factor ZBTB7A recognition of DNA and effects of ZBTB7A somatic mutations that occur in human acute myeloid leukemia. J Biol Chem 2023, 299, 102885. The study reported that a Lys at position −4 of ZF1 bridges between two guanines of two strands in a sequence specific manner.
- 80. *. Huang P, Peslak SA, Ren R, Khandros E, Qin K, Keller CA, Giardine B, Bell HW, Lan X, Sharma M, et al. : HIC2 controls developmental hemoglobin switching by repressing BCL11A transcription. Nat Genet 2022, 54:1417–1426. The study reported that (i) ZF2 is a spacer, (ii) an aromatic Tyr at position −5 of ZF4.
- 81. *. Lan X, Ren R, Feng R, Ly LC, Lan Y, Zhang Z, Aboreden N, Qin K, Horton JR, Grevet JD, et al. : ZNF410 uniquely activates the NuRD component CHD4 to silence fetal hemoglobin expression. Mol Cell 2021, 81:239–254 e238. The study reported a cross-strand base specific interaction with a Gln at position −5.
- 82. *. Wang S, Xu Z, Li M, Lv M, Shen S, Shi Y, Li F: Structural insights into the recognition of telomeric variant repeat TTGGGG by broad-complex, tramtrack and bric-a-brac - zinc finger protein ZBTB10. J Biol Chem 2023, 299, 102918. The study revealed that an Arg or Glu at position −5 makes crosss-trand interactions.
- 83.Ren R, Hardikar S, Horton JR, Lu Y, Zeng Y, Singh AK, Lin K, Coletta LD, Shen J, Lin Kong CS, et al. : Structural basis of specific DNA binding by the transcription factor ZBTB24. Nucleic Acids Res 2019, 47:8388–8398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Chiu TP, Rao S, Rohs R: Physicochemical models of protein-DNA binding with standard and modified base pairs. Proc Natl Acad Sci U S A 2023, 120, e2205796120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Yang Y, Xu Z, He C, Zhang B, Shi Y, Li F: Structural insights into the recognition of gamma-globin gene promoter by BCL11A. Cell Res 2019, 29:960–963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res 2004, 14:1188–1190. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
No new data were generated for the research described in the article.
