Abstract
RNA and DNA helicases manipulate or translocate along single strands of nucleic acids by grasping them using a conserved structural motif. We have examined the available crystal structures of helicases of the two principal superfamilies, SF1 and SF2, and observed that the most conserved interactions with the nucleic acid occur between the phosphosugar backbone of a trinucleotide and the three strand-helix loops within a (β-strand/α-helix)3 structural module. At the first and third loops is a conserved hydrogen-bonded feature called a thr-motif, often seen at α-helical N-termini, with the threonine as the N-cap residue. These loops can be aligned with few insertions or deletions, and their main chain atoms are structurally congruent amongst the family members and between the two modules found as tandem pairs in all SF1 and SF2 proteins. The other highly conserved interactions with nucleic acid involve main-chain NH groups, often at the helical N-termini, interacting with phosphate groups. We comment on how the sequence motifs that are commonly used to identify helicases map to locations on the module and discuss the implications of the conserved orientation of nucleic acid on the surface of the module for directional stepping along DNA or RNA.
Keywords: RNA helicase, DNA helicase, threonine, serine, protein–nucleic acid interactions
Introduction
RNA and DNA helicases (E.C. 3.6.1) occur abundantly throughout all genera, and they participate in virtually every key step of nucleic acid metabolism.1–3 Helicases are nucleoside triphosphate dependent motors that can interact with nucleic acids to unwind duplex substrates and to remodel nucleic acid structure and protein–nucleic acid complexes. Their diverse functional roles include participation in the processes of intron splicing, messenger RNA degradation, RNA folding, DNA replication, and chromatin remodeling. On the basis of canonical sequence motifs, helicases have been classified into five groups.4–6 We are concerned with those in the two largest groups, superfamily 1 and 2 (SF1 and SF2).7
A characteristic feature of the SF1 and SF2 proteins is an α/β sandwich domain comprising six or seven α-helices and seven parallel β-strands embedded in the interior of the globular fold. This core is related to the domain of the RecA protein that recognizes single stranded DNA in homologous recombination. Figure 1(A) shows a schematic representation of this core domain. SF1 and SF2 proteins have a tandem repeat of such domains (unrelated by any pseudo dyad axis of symmetry), and a pocket for ATP binding and catalytic hydrolysis generated from the interaction of the domain pair [Fig. 1C]. We will refer to these here as D1 and D2 for the N- and C-terminal domains, respectively. Interactions with ATP are distributed in both domains. D1 has the well-known Walker A and B ATP-binding sequence motifs, whereas other ATP binding residues are contributed by topologically nonequivalent residues of D2. Although most helicases are elaborated with additional domains at one or both termini, the conserved sequence motifs used to classify helicases are contained within the D1 and D2 core domains. Eleven sequence motifs are identifiable [Fig. 1(C)]; some are part of the ATP-binding site, whereas others are involved with nucleic acid binding although the latter are comparatively less conserved.8
Figure 1. Single-stranded nucleic acid binding (strand-helix)3 modules.

(A) One portion, in blue, of the α/β sandwich helicase domain forms a ssRNA binding module; the nucleic acid binding loops are in green. The other portion, in black, is involved with binding ATP. SF1 and SF2 helicases have a tandem repeat of the domains D1 and D2. A single NTP molecule interacts with D1 and D2 in a “closed” configuration that brings the two single-stranded binding modules closer together. (B) Schematic topology of the module. Rectangular shapes represent strands of parallel β-sheet and cylinders represent α-helices. The ssRNA, seen behind the (strand-helix)3 module, is shown in simplified representation as P = phosphate and R = ribose. The three strand-helix loops, in green, that bind the RNA are well conserved, whereas the two helix-strand loops, drawn thicker in blue, are less conserved. (C) Schematic representation of modules and conserved sequence motifs in SF1 and SF2 helicase domains. Sequence motifs Q, GG, and QxxR are characteristic of DEAD-box helicases and are absent in SF1 helicases. Interactions with RNA are indicated by black arrows. Interactions with ATP are indicated by dashed, gray arrows. Contacts represented by arrows are based on the crystal structure of the Vasa helicase. Modules M1 and M2 are indicated by blue and red and their nucleic acid binding loop sequences are green and magenta. The amino- and carboxy-termini are labelled N and C. (D) Drosophila melanogaster Vasa helicase structure with ssRNA.6 Modules M1 and M2 are shown in blue and red with the corresponding nucleic acid binding loops in green and magenta. RNA is yellow. This and other Figures of molecules were made using PyMOL (http://www.pymol.org).
The functional activities of helicases are more variable than implied by that descriptive name.3,9 Three types of activity are discernable: 1. processive with duplex unwinding; 2. processive, but without duplex unwinding; 3. nonprocessive. Processive helicases translocate along RNA or DNA giving rise, in Category 1, to duplex unwinding. In Category 2, translocation is the end effect with no duplex unwinding; hence, helicases of this group may be called translocases. In the nonprocessive helicases of Category 3, ATPase activity drives displacement of other nucleic acids or proteins that bind RNA or DNA rather than translocation. Sometimes this results in duplex unwinding, and in some cases, it does not. Some helicases have a directionality in their movement along the nucleic acid, with preference for 5’-to-3’ direction in one group and the opposite polarity for the other group.10,11 Inspite of this functional and mechanistic variability, the key structural features of D1 and D2 mediating interactions with nucleic acid are well conserved.
Here, the nucleic acid, rather than the ATP, binding sites of these enzymes are the main focus. Consideration of what is conserved leads to the identification of a compact module, consisting of three strands and three helices at the end of domains D1 and D2 whose function is to bind the phosphosugar backbone of a trinucleotide. We name the two modules M1 and M2, where M1 is a subdomain of D1 and M2 is part of D2 [Fig. 1(C,D)]. Although analyses have been provided of the sequence motifs characterizing the helicases,12,13 the aspect of nucleic acid recognition has, we suggest, been somewhat overlooked; it is also of interest because the nucleic acid sites in the two domains, unlike the ATP sites, are topologically equivalent. The aim of this work is to investigate the structural homology between the nucleic acid binding sites and explore whether it is reflected by atomic detail and by sequence conservation. Focusing on the module, we also look for the occurrence of similar folds in proteins that are not helicases, but which use the same structural elements.
Results
A structural module for nucleic acid binding
Eighteen nucleic acid-helicase module crystal structures are available for comparison from SF1 and SF2 proteins. In all these structures, binding to a single strand of RNA or DNA is to a large extent mediated by the three strand helix loops of the (strand-helix)3 module at one end of the helicase core domain (see Fig. 1). The remaining four strands and four helices can be regarded as the ATP-binding module. However, we focus on nucleic acid binding by residues within it. The arrangement of one of these modules bound to a trinucleotide is illustrated dia-grammatically in Figure 1(B) and by an α-carbon plot in Figure 2(A); the three strand-helix loops are seen to contact the mainchain (phosphates and riboses or deoxyriboses) of the nucleic acid. In binding a nucleic acid strand, the two modules from the pair of SF2 domains lie adjacent along the RNA or DNA such that each module interacts with adjacent trinucleotide units in a similar way, as seen in Figure 2(A).
Figure 2. The (strand-helix)3 modules of the intron-exon junction component eIF4III (PDB code 2hyi).
(A) A tandem pair of modules bound to the ribose-phosphates of a pentanucleotide; in each module the first loop is blue, the second green and the third yellow. Phosphorus atoms are enlarged and orange. (B) A view of module M1 bound to the ribose-phosphates of a trinucleotide. (C) The mainchain (plus all threonine) atoms of the 3 M1 loops (using the same colors and in spacefill) bound to the ribose-phosphates of a trinucleotide. Interactions of module M1 loop 1 (D), loop 2 (E), and loop 3 (F) with the ribose-phosphates of RNA (main chain atoms only except for threonine and arginine). Figures are from different viewpoints.
The three SF2 helicase crystal structures available as ssRNA-bound complexes are the human translation intiation factor eIF4AIII,14,15 the Drosophila melanogaster RNA helicase Vasa [Fig. 1(D)], and the human RNA helicase DDX19b.6,16 The examples of ssDNA-bound helicase structures in the SF2 family are hepatitis C virus NS3 and archaeal HEL308.17,18 In the SF1 family, there are five bacterial proteins: Escherichia coli RecB,19 Geobacillus stearothermophilus PcrA,20 E. coli Rep,21 Deinococcus radiodurans RecD2,11 and E. coli UvrD.22 In the SF2 family proteins, both domains bind nucleic acid, whereas in three proteins of the SF1 family only one of the domains binds DNA in a homologous manner. Sulfolobus sulfataricus SNF2,23 in the SF2 family, forms a complex with duplex DNA and is unusual in that it is the only example of a helicase bound to a double-stranded nucleic acid. One domain of the SNF2 protein binds to an individual DNA strand of the duplex in the same conformation and relative orientation as found in the other seventeen cocomplexes. Thus, for this purpose, it can be regarded as another example of a module bound to ssDNA.
Comparison with RecA
Although it is widely recognized that the fold of the SF1 and SF2 helicase core is related to the RecA recombination protein, crystal structures of RecA bound to ssRNA and ssDNA24 reveal important differences from these helicases. RecA lacks one of the strand-helices of the (strand-helix)3 modules described in this work. Also a stretch of polypeptide from RecA is situated in the place that the ssRNA or ssDNA binds in the SF1 and SF2 helicases. Instead RecA binds ssRNA or ssDNA via a different set of loops. In spite of these differences, there are also some points of similarity. The loops from each RecA domain bind a trinucleotide, and also successive domains line up along the extended ssRNA or ssDNA in such a way that a pair of domains binds a hexanucleotide.
Nucleic acid-module contacts and the origins of directionality
Considering the 18 available module structures, each bound to the main chain of a single-stranded nucleic acid, a number of atomic interactions and features are found to be conserved in all; they are those involved with phosphate recognition, which is common to RNA and DNA. The N-terminus of the helix of the first strand-helix loop [shown in blue in Fig. 2(A–C)] binds the phosphate of the first nucleotide. The two successive nucleotide phosphates bind in a shallow depression between the other two strand-helix loops [Fig. 2(C)]. The N-terminus of the short helix of the third strand-helix loop [shown in yellow in Fig. 2(A–C)] binds the second phosphate of the RNA; the N-terminus of the second strand helix loop [shown in green in Fig. 2(A–C)], though near, does not bind a phosphate directly and instead a loop at the C-terminus of its β-strand binds the second phosphate. The atomic interactions at these three loops, many of which involve main chain polypeptide atoms, are seen in Figure 2(D–F). A schematic representation of the interactions of a typical SF2 module with an RNA trinucleotide is shown in Figure 3(A).
Figure 3. Diagram of the three strand-helix loops of the (strand-helix)3 modules of SF2 and SF1 helicases interacting with RNA and DNA trinucleotides.
(A) SF2 RNA helicases (B) SF2 DNA helicases (C) SF1 helicases. Hydrogen bonds are shown as dashed lines. The protein main chain is shown by ribbons and certain side chains are shown in grey. Pentagons and circles represent the alternating riboses or deoxyriboses and phosphates of the main chain of RNA or DNA. Residues of A are as for domain 1 of eIF4AIII. In B and C other interactions with the phosphates are present, but the diagram shows those hydrogen bonds in common with the RNA helicases.
In all 18 cases, the relative orientation of module and nucleic acid is the same, and the precise three-dimensional array of hydrogen-bonded contacts, as in Figures 2 and 3, could not be maintained if the direction of the nucleic acid, whether RNA or DNA, were reversed. Thus, the module and nucleic acid interact with a defined polarity. This bears on the directionality of the motion of helicases, a topic to which we return in the “Discussion.”
Conserved structure and sequence within modules
The main chain atoms of the three strand-helix loops of the 18 modules have remarkable structural congruence. The structure-based sequence alignments of the 18 modules, seen in Table I, reveal no insertions or deletions at these loops, whereas the loops at the helix-strand junctions that do not contact the nucleic acid in contrast require several gaps for optimal alignment. At the bottom of Table I, the main chain conformations of residues that are the same in all 18 modules are listed by letter codes corresponding to their main chain conformation and secondary structure, and the identity of the pattern provides a concise indication of the structural conservation of the three strand-helix loops that engage the nucleic acid.
Table I. Alignment of RNA- and DNA-Binding (strand-helix)3 Modules Based on Known Protein-Nucleic Acid Complex Structures.
| 1st Strand-helix loop | 2nd Strand-helix loop | 3rd Strand-helix loop |
|---|---|---|
| 1. QALILAPTRELAVQIQKGLLALGDYMNV- | QCHACIGGTNVGEDIRKLDYG---- | QHVVAGTPGRVF |
| 2. QAVIFCNTKRKVDWLTEKMREAN----- | FTVSSMHGDMPQKERESIMKEFRSGA | SRVLISTDVWAR |
| 3. QVVIVSPTRELAIQIFNEARKFAFESY- | LKIGIVYGGTSFRHQNECITRG---- | CHVVIATPGRLL |
| 4. GTIVFVETKRGADFLASFLSEKE----- | FPTTSIHGDRLQSQREQALRDFKNGS | MKVLIATSVASR |
| 5. QCLCLSPTYELALQTGKVIEQMGKFYP- | ELKAYAVRGNKLERCGKISE------- | QIVIGTPGTVL |
| 6. QAMIFCHTRKTASWLAAELSKEGH----- | QVALLSGEMMVEQRAAVIERFREGK | EKVLVTTNVCAR |
| 7. KVLVLNPSVAATLGFGAYMSKAHVD---- | PPNIRTGVRTITTGS----------- | PITYSTYGKFL |
| 8. RHLIFCHSKKKCDELAAKLVALGI----- | NAVAYYRGLDVSVIP------- | TSGDVVWATDAL |
| 9. KSLYVVPLRALAGEKYESFKKWEKI-- | GLRIGISTGDYESRDEHLGDC------- | DIIVTTSEKAD |
| 10. GVLVFESTRRGAEKTAVKLSAITAKYV | | KGAAFHHAGLLNGQRRVVQDAFRRGN | IKVVVATPTLAA |
| 11. PSLVICPLSVLKNWEEELSKFAP---- | HLRFAVFHEDRSKIKLEDY--------- | DIILTTYAVLL |
| 12. SDISVLRSRQEAAQVRDALTLLEI----- | PSVYLSNR|| | ||LVQIVTIHKSK |
| 13. DFAVLYRTNAQSRVMEEMLLKANI----- | PYQIVGGL|| | ||AVMLMTLHAAK |
| 14. DYAILYRGNHQSRVFEKFLMQMRI----- | PYKISGGT|| | ||QVQLMTLHASK |
| 15. SIMAVTFTNKAAAEMRHRIGQLMGTSQ-- | ------------------------- | GGMWVGTFHGLA |
| 16. ECAILYRSNAQSRVLEEEALLQSAM---- | PYRIYGG|| | ||AVQLMTLHSAK |
| 17. EVGLCAPTGKAARRLGEVT-- | ---------------------------------- | GRTASTVHRLL |
| 18. AVQVLTPMRKGPLGMDHLNYHLQALFN|| | ------------------------- | ||GYALTVHRAQ |
| : : : : : | : : | ::: : |
| Con. LIL TR I L | I G | VVV T |
| Str.EEEEEABHHHHHHHHHHHHHH | EEEE | EEEEEHHHHH |
Row 1: SF2, ssRNA, Human eIF4AIII, 2hyi, Anderson et al.,14 M1, residues 108–168. Row 2: SF2, ssRNA, 2hyi M2, res. 279–339. Row 3: SF2, ssRNA, Drosophila Vasa, 2db3, Sengoku et al.,6 M1, residues 320–380. Row 4: SF2, ssRNA, 2db3 M2, res. 491–551. Row 5: SF2, ssRNA, Human DDX19b, 3g0h, Collins et al.,16 M1, residues 164–222. Row 6: SF2, ssRNA, 3g0h, M2, res. 335–395. Row 7: SF2, ssDNA, Human virus NS3, 1a1v, Kim et al.,17 M1, residues 224–277. Row 8: SF2, ssDNA, 1a1v, M2, res. 363–414. Row 9: SF2, ssDNA, A. fulgidus HEL308, 2p6r, Buttner et al.,18 M1, residues 70–126. Row 10: SF2, ssDNA, 2p6r, M2, res 244–270, 297–334. Row 11: SF2, dsDNA, S. solfataricus SNF2, 1z63, Durr et al.,23 M1, residues 494–546. Row 12: SF1, ssDNA, E. coli RecB, 1w36, Singleton et al.,19 M2, res. 552–584, 735–745. Row 13: SF1, ss DNA, G. stearothermophilus. PcrA, 3pjr, M2, residues 353–384, 558–568. Row 14: SF1, ssDNA, E. coli Rep helicase, 1uaa, M2, residues 344–375, 551–563. Row 15: SF1, ssDNA, E. coli UvrD helicase, 2is4, Chen et al.,24 M1, residues 56–94. Row 16: SF1, ssDNA, 2is4, M2, res. 348–379, 553563. Row 17: SF1, ssDNA, D. radiodurans RecD2, 3gpl, Saikrishnan et al.,11 M1, res. 382–412. Row 18: SF1, ssDNA, 3gpl, M2, res. 546–572, 640–649 (residues 555, 556 and 559, at the beginning of the first α-helix, have non-matching conformations). Con.: consensus sequence for the 18 modules. Str.: amino acids whose main chain conformations are the same in the 18 modules. In this table, residues at equivalent and adjacent positions on the three strands of the β-sheet are shown in bold. The letters used for the main chain conformation of individual amino acids in the str row have the meanings: E, β-conformation and occurring at homologous positions within the 3-stranded parallel β-sheet. H, α-helix conformation and occurring at homologous positions within the helix. B, β-conformation φ= −120°, φ = 130°. A, right-handed α-helical conformation φ= −80°, φ = −10°. All values are + or −50°.
Several residues are conserved at the three strand helix loops, most consistently the two threonines (sometimes serines), so these are key structural features [Fig. 2(D,F)]. The two residues are part of thr-motifs and ser-motifs, which are features commonly found at α-helical N-termini in proteins,25 and are characterized by two hydrogen bonds: One is between the ser/thr sidechain and the main chain amide NH group three residues ahead; the other is between the mainchain carbonyl of the serine or threonine and the mainchain amide NH four residues ahead. The threonine (or serine) is the N-cap residue (the first to be involved in mainchain-mainchain hydrogen-bonding) at the N-terminus of the helix. Two thrmotifs occur typically in each module, at the N-termini of the first and third helices. The first threonine does not contact the nucleic acid directly although it is near, whereas the side chain hydroxyl group of the second threonine, which is particularly well conserved, binds a DNA or RNA phosphate directly.
Another thr-motif where one of the main chain NH groups within the motif contacts the phosphate of a bound nucleic acid is seen in the methylated DNA-binding protein MeCP2.26 The threonine in question is the residue most commonly mutated in Rett syndrome, and its mutation to alanine, but not serine, disrupts binding to methylated DNA.
It is useful to compare the interactions with nucleic acid of the mainchain NH atoms of the helical N-termini at the first and third strand helix loops. If the N-cap threonine is residue i, the phosphate near the first loop forms hydrogen bonds to the NH of residue i+1, while the phosphate near the third loop typically hydrogen bonds to the NH of residue i+3. Thus, in spite of both loops having thr-motifs at helical N-termini, their interactions with nucleic acid differ subtly.
The β-turns in the second strand-helix loop of the RNA-bound modules can be of either Type I or Type II. It has been demonstrated27 that these two types of turn interconvert readily, so can be regarded as to some extent equivalent.
Correspondence of module structure with helicase signature sequences
The 11 highly conserved sequence motifs in helicases, in sequence order, are: Q, I, Ia, GG, Ib, II, III, IV, QxxR, V, VI. The GG and QxxR motifs are defined for a subset of SF2 helicases, the DEAD-box group,6 where the arginine sidechain forms a hydrogen bond with an RNA phosphate. The other motifs are more widely distributed and occur in SF1 and SF2 helicases.4,5 Motif Ib is sometimes referred to as TxGx in the DEAD-box helicases.2 Earlier studies show that motifs I, II, III, V, VI, and Q are related to NTP binding and NTPase activity. The three strand helix loops in the first module M1 correspond to sequence motifs Ia, GG, and Ib, and those in M2 correspond to motifs IV, QxxR, and V, as in Figure 1(C). Our work, thus, draws attention to the correspondence in sequence and structure between the pairs of motifs: Ia and IV; GG and QxxR; Ib and V. The homology exists because the (strand-helix)3 modules in the two domains bind the main chain part of single-stranded nucleic acids in essentially the same way.
Module variants
Those features present in domains that bind RNA but absent in those that prefer DNA enable recognition of the 2’-OH of the ribose; they are seen in Figure 3 and are listed in Supporting Information Table S4. Table S4 also lists the other recurring features of the three main subgroupings in Table I (six SF2 ssRNA modules; five SF2 ssDNA modules; seven SF1 ssDNA modules).
The seven SF1 modules are variable with regard to their second strand-helix loop. The five C-terminal modules (M2) have a 160 residue insertion (or indel) at that position, whereas in M1 (and M2 of RecD2), this strand helix loop is missing altogether. Only two of the five M1 modules occur in the crystal bound to an ssDNA trinucleotide; they are in rows 15 and 17 of Table I. It is not unexpected that the conformation of that loop, when it exists, differs substantially from the SF2 ones. Diagrams of the SF1 and SF2 interactions in the DNA-bind-ing modules are shown in Figure 3(B,C).
Supporting Information Tables S1 and S2 show structure-based alignments for available crystal structures of RNA and DNA SF2 helicase modules and includes those that are not bound to nucleic acid. Supporting Information Table S3 gives the corresponding alignments for the SF1 helicases. Once again the strand-helix loops, but not the helix-strand loops, align without gaps. Comparing SF2 RNA and DNA helicases, no striking conserved sequence differences are apparent. The SF1 helicases, compared to the SF2 group, are more divergent. The sequence TxH around the second conserved threonine is notable. The sidechain of the histidine is seen to lie next to a deoxyribose and is thought to give rise to the specificity for DNA because it would clash with the 2’-oxygen of the ribose of RNA.11 The one SF1 RNA helicase, Upf1, lacks this histidine in M1 and M2.
Figure 4 reveals a difference in the relative arrangement of the two modules from the two domains with respect to each other and the nucleic acid. SF1 (except the UvrD and RecD2 structures with PDB codes 2is2 and 3gp8) and RNA-bound SF2 pairs of modules exhibit one type of arrangement occupying five nucleotides while DNA-bound SF2 pairs of modules have an arrangement that occupies six nucleotides. UvrD and RecD2 helicases are of interest because in the ATP-bound form they engage five nucleotides, whereas without ATP they adopt a six-nucleotide arrangement. We return to the significance of this difference later.
Figure 4. Diagram of binding of modules M1 and M2 to nucleic acid.
A shows one arrangement commonly observed and B shows another. All helicases have one or the other arrangement but UvrD and RecD2 have A, the open conformation, in the absence of ATP, and B, the closed conformation, in its presence. These two conformations are thought to be important for the translocation of the helicase along ssDNA.
Several SF2 helicases are thought to be nonprocessive and catalyze remodeling of nucleoprotein complexes rather than helix unwinding.9 Although their nucleic acid binding capacity is intact, their primary function is to displace other proteins from RNA. Of particular interest are the SNF2 helicases associated with chromatin remodel-ing.23,28 They are processive but with little or no duplex unwinding capability. Instead their ATPase activity is utilized for the translocation of double-stranded DNA. One of the 18 examples of a helicase module–DNA interaction belongs to this protein group. The conformation of a bound strand of double helical DNA is not unlike that of a single strand as bound to other helicases, with the phosphate-ribose backbone exposed on the outside, so the module can bind the DNA strand in much the same way as seen for the module-single stranded complex and without steric hindrance. It is intriguing that this mode of binding is found in a helicase that may translocate along double-stranded DNA and not unwind the duplex, so it may be functionally significant. Some inter-mainchain hydrogen bonds at the interface are absent or weak compared to the other helicases. This could be related to the module binding a strand embedded within a double helix, making it inflexible compared to an isolated strand.
Occurrence of the module in other proteins
The protein database was searched for structural homologues containing homologous (strand-helix)3 modules in noncanonical SF1 or SF2 helicases using the DALI and SSM search engines.29,30 The results were filtered to remove helicase homologues and minimize redundancy, then manually curated. Representative overlays are presented in Figure 5. A criterion was imposed that similar folds have strand-helices in the same order: 2-3-1, strand 2 being on the edge of the sheet.
Figure 5. Overlay of helicase modules and nonhelicase proteins.
(A) Superposition of E coli DNA polymerase III (PDB code: 1em8,32 colored green) and module 2 of Vasa helicase (PDB code: 2db3,6 colored blue) shown from two different viewpoints. (B) Superposition of polynucleotide kinase (1ly1,33 colored red) and module 2 of archaeal helicase Hel308 (2p6r,18 colored blue) shown from two different viewpoints. Superpositions were performed using SSM,31 with the modules described in Table I. The superpositions with lowest RMSDs are presented.
Apart from noting a protein, MeCp2,26 that uses a thr-motif to bind the phosphate of DNA, the best match is in the chi subunit (PDB code 1em8; residues 38–90) of E. coli DNA polymerase III [Fig. 5A].31 Whether it binds DNA in a homologous manner is uncertain, especially as it lacks the threonines of SF1 and SF2 helicases. Instead the chi subunit might play a purely structural role in the organization of the polymerase subunits. However, it exhibits homology over a whole domain, not just a module. Another match for the module is in polynucleotide kinase [PDB code 1ly1; residues 1–90, Fig. 5(B)],32 a member of a family of homologous kinases including shikimate kinase, 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase, chloramphenicol phospho-transferase, and adenylate kinase. These kinases have a P-loop for binding the β-phosphate of ATP apparently inserted at the first strand-helix loop so are functionally different. Other matching proteins occurred in the searches, but some have only two strand-helices and others have a different strand helix order, 2-1-3 instead of 2-3-1.
Discussion
SF1 and SF2 helicases are characterised by a pair of ATP-binding α/β domains. Having examined the 18 unique domains available as crystal structures bound to RNA or DNA to find conserved features at the nucleic acid interface, we describe a structural module located at one end of each domain, composed of three alternating strands and helices, as in Figure 1(B). Each module forms a binding site for a single strand of nucleic acid via the three strand-helix loops at one side of the module. The sequences of these loops can be aligned without gaps and their main chain structures are remarkably similar, whereas the helix-strand loops on the other side of the module differ.
Individual (strand-helix)3 modules of SF1 and SF2 helicases bind the main chain parts (phosphates and riboses or deoxyriboses) of di- or trinucleotides along ssRNA or ssDNA. In each protein chain, a pair of such modules bind either hexanucleotides or pentanucleotides with a defined polarity, as in Figures 2(A) and 4. Most helicases bind ssRNA or ssDNA, but one, the chromatin remodeling factor SNF2, binds to one strand of a DNA duplex. As the conformation of such a strand resembles that of single-stranded DNA as bound to other helicases, the interaction is comparable to that of the other helicase complexes. Given the simplicity of the module–nucleic acid interaction, the question comes to mind how different processive helicases have preferred directions for translation, some moving from 3’ to 5’ along the single strand, and others with the reverse polarity. The directionality will depend on which of the modules are first to dissociate and rebind to a strand, at some restrained distance away from the initial binding site. The pathway followed is likely to be affected by domain movements at the D1-D2 interface in response to ATP binding and hydrolysis.
We explored if the module could function as an independent RNA binding motif in other proteins. A search of the database revealed a few examples where the module occurs in other proteins, but no evidence for nucleic acid binding was found. For instance, it is found in the chi subunit of E. coli DNA polymerase III,31 where it plays a purely structural role in the organization of the polymerase subunits.
Our survey of modules complexed with single strands of DNA or RNA identifies a pattern of sequence and structure conservation that is not obvious from the well-known sequence motifs of Gorbalenya et al. 4 A number of the mainchain atoms of the three strand-helix loops of the module, some of which are at the N-termini of α-helices, contact the nucleic acid. Regarding sequence signatures, two threonine residues, sometimes serines, are conserved, even though they are not always conspicuous in helicase alignments. One is at the first strand-helix loop; the second, which contacts a phosphate, is at the third strand-helix loop and is particularly well conserved. A helicase has four threonines, corresponding to positions at sequence motifs Ia, Ib, IV, and V of the helicase sequence signature. These residues stabilize local structure by forming thr- and ser- motifs in which the threonine or serine is the N-cap residue at the N-terminus of an a-helix.25,33,34 In relation to structure and sequence motif, Ia matches motif IV and motif Ib resembles motif V. This is not entirely surprising as the two modules are homologous in overall 3D structure, but the wide use of the sequence motifs of Gorbalenya et al.,4 plus the fact that the other functions, such as ATP binding, of the two domains differ, means that their nucleic acid-binding homology may not be apparent. The importance of thr-motifs for DNA binding is suggested by the nonhelicase methylated DNA-binding protein MeCP226; here, a main chain NH group within a thr-motif contacts the phosphate of a bound nucleic acid. This threonine in MeCP2 is most commonly mutated in Rett syndrome, and its mutation to alanine, but not serine, disrupts DNA binding.35
The helicases have diverse ATP-linked functions, and their mechanisms have been studied extensively. Most SF1 and SF2 helicases are thought to cycle between an ATP-bound “closed” conformation with the D1 and D2 domains close together, and an “open” conformation in the absence of ATP, when the domains move apart. Both conformations are observed in DNA-bound forms of the processive SF1 UvrD (2is4) and RecD2 helicases (3gpl).11,22 They are of interest because in the ATP-bound forms they engage 5 nucleotides, whereas without ATP, they adopt a 6-nucleotide arrangement. This is illustrated in Figure 4. A similar organization is likely in many SF2 helicases.36 Also shown in Figure 4 is a summary of the hydrogen bond interactions conserved among the 18 examples of modules bound to nucleic acid. Such findings lead to proposals that ATP binding results in translocation of the single-stranded DNA with respect to the protein by one nucleotide at a time. Kawaoka et al.37 and Pyle3 envision the mechanism of processive helicases as tracking by two helicase domains one nucleotide at a time along the backbone of the nucleic acid, and a threonine side chain (the one in the third strand helix loop) is identified as a key residue for phosphate binding. These latter ideas are reminiscent of, and compatible with, the diagrams in Figure 4. Whether they function processively or in other modes, all known SF1 and SF2 helicases use tandem (β-strand/α-helix)3 structural modules as the fundamental units for interacting with nucleic acid, and we suggest that divergent mechanistic features arise from features outside these modules.
Definitions
D1, D2, M1, and M2: A chain of an SF1 or SF2 helicase possesses a pair of α/β sandwich domains with 6–7 strands and 6–7 helices. The N-terminal domain is referred to as D1 and the C-terminal domain as D2. At one end of each domain is a nucleic acid binding module consisting of three strands and three helices, the subject of this work. The module in D1 is named here M1 and the one in D2 is named M2. SF1 and SF2: Helicase superfamily 1 and 2, respectively. N-cap: The N-cap residue of an α-helix is the first residue to be involved in the mainchain–mainchain hydrogen bonding of the helix.38
Supplementary Material
Acknowledgments
We thank Adrian Schreyer for advice for filtering the search engine results. We thank Luca Pellegrini and K. Saikrishnan for helpful comments and stimulating discussions.
Grant sponsor
Biotechnology and Biological Sciences Research Council studentship
Grant sponsor
Wellcome Trust.
References
- 1.Silverman E, Edwalds-Gilbert G, Lin R-L. DExD/H-box proteins and their partners: helping RNA helicases unwind. Gene. 2003;312:1–16. doi: 10.1016/s0378-1119(03)00626-7. [DOI] [PubMed] [Google Scholar]
- 2.Singleton MR, Dillingham MS, Wigley DS. Structure and mechanism of helicases and nucleic acid translocases. Ann Rev Biochem. 2007;76:23–50. doi: 10.1146/annurev.biochem.76.052305.115300. [DOI] [PubMed] [Google Scholar]
- 3.Pyle A-M. Translocation and unwinding mechanisms of RNA and DNA helicases. Ann Rev Biophys. 2008;37:317–336. doi: 10.1146/annurev.biophys.37.032807.125908. [DOI] [PubMed] [Google Scholar]
- 4.Gorbalenya AE, Koonin EV, Donchenko AP, Blinov VM. A novel family of nucleoside triphosphate-binding motif containing proteins which are probably involved in duplex unwinding in DNA recombination and replication. FEBS Lett. 1988;235:16–24. doi: 10.1016/0014-5793(88)81226-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cordin O, Banroques J, Tanner NK, Linder P. Review: the DEADbox protein family of RNA helicases. Gene. 2006;367:17–37. doi: 10.1016/j.gene.2005.10.019. [DOI] [PubMed] [Google Scholar]
- 6.Sengoku T, Nureki O, Nakamura A, Kobayashi S, Yokoyama S. Structural basis for RNA unwinding by the DEAD-box protein Drosophila Vasa. Cell. 2006;125:287–300. doi: 10.1016/j.cell.2006.01.054. [DOI] [PubMed] [Google Scholar]
- 7.Lohman TM, Tomko EJ, Wu CG. Non-hexameric DNA helicases and translocases: mechanisms and regulation. Nature Rev Mol Cell Biol. 2008;9:391–401. doi: 10.1038/nrm2394. [DOI] [PubMed] [Google Scholar]
- 8.Jankowsky E, Fairman ME. RNA helicases—one fold for many functions. Curr Opin Struct Biol. 2007;17:316–324. doi: 10.1016/j.sbi.2007.05.007. [DOI] [PubMed] [Google Scholar]
- 9.Jankowsky E, Bowers H. Remodeling of ribonucleoprotein complexes with DExH/D helicases. Nucleic Acids Res. 2006;34:4181–4188. doi: 10.1093/nar/gkl410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Yu J, Ha T, Schulten K. How directional translocation is regulated in a DNA helicase motor. Biophys J. 2007;93:3783–3797. doi: 10.1529/biophysj.107.109546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Saikrishnan K, Powell B, Cook NJ, Webb MR, Wigley DB. Mechanistic basis of 5’-3’ translocation in SF1B helicases. Cell. 2009;137:849–859. doi: 10.1016/j.cell.2009.03.036. [DOI] [PubMed] [Google Scholar]
- 12.Banroques J, Cordin O, Doere M, Linder P, Tanner NK. A conserved phenylalanine of motif IV in superfamily 2 helicases is required for cooperative ATP-dependent binding of RNA substrates in DEAD-box proteins. Mol Cell Biol. 2008;28:3359–3371. doi: 10.1128/MCB.01555-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Cordin O, Tanner NK, Doere M, Linder P, Banroques J. The newly discovered Q motif of DEAD-box RNA helicases regulates RNA-binding and helicase activity. EMBO J. 2004;23:2478–2487. doi: 10.1038/sj.emboj.7600272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Andersen CB, Ballut L, Johansen JS, Chamieh H, Nielsen KH, Oliviera CL, Pedersen JS, Seraphin B, Le Hir H, Andersen GR. Structure of the exon junction core complex with a trapped DEADbox ATPase bound to RNA. Science. 2006;313:1968–1972. doi: 10.1126/science.1131981. [DOI] [PubMed] [Google Scholar]
- 15.Bono F, Ebert J, Lorentzen E, Conti E. The crystal structure of the exon junction complex reveals how it maintains a stable grip on mRNA. Cell. 2006;126:713–725. doi: 10.1016/j.cell.2006.08.006. [DOI] [PubMed] [Google Scholar]
- 16.Collins R, Karlberg T, Lehtio L, Schutz P, van den Berg S, Dahlgren L-G, Hammarstrom M, Weigelt J, Schuler H. The DExD/H-box RNA helicase DDX19 is regulated by an α-helical switch. J Biol Chem. 2009;284:10296–10300. doi: 10.1074/jbc.C900018200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Kim JL, Morgenstern KA, Griffith JP, Dwyer MD, Thomson JA, Murcko MA, Lin C, Caron PR. Hepatitis C virus NS3 helicase domain with a bound oligonucleotide: crystal structure provides insights into the mode of unwinding. Structure. 1998;6:89–100. doi: 10.1016/s0969-2126(98)00010-0. [DOI] [PubMed] [Google Scholar]
- 18.Buttner K, Nehring S, Hopfner K-P. Structural basis for DNA duplex separation by a superfamily-2 helicase. Nature Struct Mol Biol. 2007;14:647–652. doi: 10.1038/nsmb1246. [DOI] [PubMed] [Google Scholar]
- 19.Singleton MR, Dillingham MS, Gaudier M, Kowalczykowski SC, Wigley DB. Crystal structure of RecBCD enzyme reveals a machine for processing DNA breaks. Nature. 2004;432:187–193. doi: 10.1038/nature02988. [DOI] [PubMed] [Google Scholar]
- 20.Velankar SS, Soultanas P, Dillingham MS, Subramanya HS, Wigley DB. Crystal stuctures of complexes of PcrA DNA helicase with a DNA substrate indicate an inchworm mechanism. Cell. 1999;97:75–84. doi: 10.1016/s0092-8674(00)80716-3. [DOI] [PubMed] [Google Scholar]
- 21.Korolev S, Hsieh J, Gauss GH, Lohman TM, Waksman G. Major domain swivelling revealed by the crystal structures of complexes of E coli Rep helicase bound to single-stranded DNA and ADP. Cell. 1997;90:635–645. doi: 10.1016/s0092-8674(00)80525-5. [DOI] [PubMed] [Google Scholar]
- 22.Lee JY, Yang W. UvrD helicase unwinds DNA one base at a time by a two-part power stroke. Cell. 2006;127:1349–1360. doi: 10.1016/j.cell.2006.10.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Durr H, Flaus A, Owen-Hughes T, Hopfner K-P. Snf2 family ATPases and DExx box helicases: differences and unifying concepts. Nucleic Acids Res. 2006;34:4160–4167. doi: 10.1093/nar/gkl540. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Chen Z, Yang H, Pavletich NP. Mechanism of homolgous recombination from the RecA-ssDNA/dsDNA structures. Nature. 2008;453:498–495. doi: 10.1038/nature06971. [DOI] [PubMed] [Google Scholar]
- 25.Wan W-Y, Milner-White EJ. A recurring two-hydrogen bond motif incorporating a serine or threonine residue is found both at a-helical N-termini and in other situations. J Mol Biol. 1999;286:1651–1662. doi: 10.1006/jmbi.1999.2551. [DOI] [PubMed] [Google Scholar]
- 26.Ho KL, McNae IW, Schmiederberg L, Klose RJ, Bird AP. Walkin-shaw MD. MeCP2 binding to DNA depends on hydration at Methyl-CpG. Mol Cell. 2008;29:525–531. doi: 10.1016/j.molcel.2007.12.028. [DOI] [PubMed] [Google Scholar]
- 27.Gunasekaran K, Gomathi L, Ramakrishnan C, Chandrasekhar J, Balaram P. Conformational interconversions in peptide β-turns: analysis of turns in proteins and computational estimates of barriers. J Mol Biol. 1998;284:1505–1516. doi: 10.1006/jmbi.1998.2154. [DOI] [PubMed] [Google Scholar]
- 28.Durr H, Korner C, Muller M, Hickmann V, Hopfner K-P. X-ray structures of the Sulfolobus solfataricus SWI2/Snf2 ATPase core and its complex with DNA. Cell. 2005;121:363–373. doi: 10.1016/j.cell.2005.03.026. [DOI] [PubMed] [Google Scholar]
- 29.Holm L, Kaariainen S, Rosenstrom P, Schenkel A. Searching protein structure databases with DaliLite v. 3. Bioinformatics. 2008;24:2780–2781. doi: 10.1093/bioinformatics/btn507. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Krissinel E, Henrick K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Cryst D. 2004;60:2256–2268. doi: 10.1107/S0907444904026460. [DOI] [PubMed] [Google Scholar]
- 31.Gulbis JM, Kazmirski SL, Finkelstein J, Kelman Z, O’Donnell M, Kuriyan J. Crystal structure of the chi:psi subassembly of the Escherichia coli DNA polymerase clamp-loader complex. Eur J Biochem. 2004;271:439–449. doi: 10.1046/j.1432-1033.2003.03944.x. [DOI] [PubMed] [Google Scholar]
- 32.Wang LK, Lima CD, Shuman S. Structure and mechanism of polynucleotide kinase: an RNA repair enzyme. EMBO J. 2002;21:3874–3880. doi: 10.1093/emboj/cdf397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Duddy WJ, Nissink JWM, Allen FH, Milner-White EJ. Mimicry by asx- and ST-turns of the four main types of β-turn in proteins Protein Sci. 2004;13:3051–3055. doi: 10.1110/ps.04920904. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Leader DP, Milner-White EJ. Motivated Proteins: a web application for studying small three-dimensional protein motifs. BMC Bioinformatics. 2009;10:60. doi: 10.1186/1471-2105-10-60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Neul JL, Zoghbi HY. Rett syndrome: a prototypical neurodevelop- mental disorder. Neuroscientist. 2004;10:118–128. doi: 10.1177/1073858403260995. [DOI] [PubMed] [Google Scholar]
- 36.Hopfner K-P, Michaelis J. Mechanisms of nucleic acid translocases: lessons from structural biology and single-molecule biophysics. Curr Opin Struct Biol. 2007;17:87–95. doi: 10.1016/j.sbi.2006.11.003. [DOI] [PubMed] [Google Scholar]
- 37.Kawaoka J, Jankowsky E, Pyle A-M. Backbone tracking by the SF2 helicase NPH-II. Nat Struct Mol Biol. 2004;11:526–530. doi: 10.1038/nsmb771. [DOI] [PubMed] [Google Scholar]
- 38.Richardson JS, Richardson DC. Amino acid preferences for specific locations at the ends of α-helices. Science. 1988;240:1648–1652. doi: 10.1126/science.3381086. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




