Skip to main content
The Journal of Biological Chemistry logoLink to The Journal of Biological Chemistry
. 2020 Aug 12;295(41):14236–14247. doi: 10.1074/jbc.RA120.015055

The DUF328 family member YaaA is a DNA-binding protein with a novel fold

Janani Prahlad 1,, Yifeng Yuan 2,, Jiusheng Lin 1, Chou-Wei Chang 4, Dirk Iwata-Reuyl 3, Yilun Liu 4, Valérie de Crécy-Lagard 2,5,*, Mark A Wilson 1,*
PMCID: PMC7549036  PMID: 32796037

Abstract

DUF328 family proteins are present in many prokaryotes; however, their molecular activities are unknown. The Escherichia coli DUF328 protein YaaA is a member of the OxyR regulon and is protective against oxidative stress. Because uncharacterized proteins involved in prokaryotic oxidative stress response are rare, we sought to learn more about the DUF328 family. Using comparative genomics, we found a robust association between the DUF328 family and genes involved in DNA recombination and the oxidative stress response. In some proteins, DUF328 domains are fused to other domains involved in DNA binding, recombination, and repair. Cofitness analysis indicates that DUF328 family genes associate with recombination-mediated DNA repair pathways, particularly the RecFOR pathway. Purified recombinant YaaA binds to dsDNA, duplex DNA containing bubbles of unpaired nucleotides, and Holliday junction constructs in vitro with dissociation equilibrium constants of 200–300 nm. YaaA binds DNA with positive cooperativity, forming multiple shifted species in electrophoretic mobility shift assays. The 1.65-Å resolution X-ray crystal structure of YaaA reveals that the protein possesses a new fold that we name the cantaloupe fold. YaaA has a positively charged cleft and a helix-hairpin-helix DNA-binding motif found in other DNA repair enzymes. Our results demonstrate that YaaA is a new type of DNA-binding protein associated with the oxidative stress response and that this molecular function is likely conserved in other DUF328 family members.

Keywords: DNA binding, comparative genomics, X-ray crystallography, bacterial stress response, UPF0246, DNA-binding protein, oxidative stress, bioinformatics, stress response


Bacteria confront diverse environmental stresses and mount defensive responses to ensure survival and growth. The accumulation of reactive oxygen species is a common stressor, causing cellular oxidative stress and eliciting a robust and multifaceted response in bacteria. In Escherichia coli, one of the primary sensors for oxidative stress is the OxyR transcription factor, which regulates the expression of ∼20–40 genes in response to oxidation of its regulatory cysteine residues (1, 2). Hydrogen peroxide is the dominant OxyR effector and therefore most of the genes controlled by OxyR are involved in the peroxide stress response (3). Although many of these downstream peroxide-responsive proteins have been extensively characterized in bacteria, some are surprisingly poorly understood.

One peroxide-responsive protein whose function is unclear is E. coli YaaA (YaaAEc, locus b0006), a member of the DUF328/UPF0246 family. YaaAEc is up-regulated in response to increased H2O2 levels (1), and it decreases intracellular Fe2+ levels in E. coli by an unknown mechanism (4). Fe2+ is cytotoxic in the presence of hydrogen peroxide because of the production of highly reactive hydroxyl radicals via Fenton chemistry (5). Therefore, the tight control of Fe2+ is important during peroxide stress and helps explain why YaaAEc is under the control of the OxyR regulon.

Gene expression array analysis showed that the transcription level of yaaA in E. coli is reduced in stationary phase and in anaerobic conditions and increased during recovery in LB broth from a stationary phase inoculum (6). This observation suggests that YaaA is important for the exponential phase of aerobic growth. In addition, the transcription levels of yaaA, yaaJ (a putative transport protein next to yaaA), and ten genes (fhuA, fhuF, fiu, cirA, entC, entB, exbD, fecI, fepB, and fepD) that are responsible for iron acquisition were considerably lower in exponential phase growth of an rpoS mutant compared with WT E. coli MG1655 (7), indicating that RpoS regulates YaaA in E. coli. The connection between iron, peroxide, and YaaA was bolstered by the recent report that a yaaA mutant of Klebsiella pneumonia has impaired survival in the presence of H2O2 in iron-replete culture conditions, whereas there is no difference in survival of WT and yaaA mutant K. pneumonia in oxidative stress under low-iron conditions (8).

Bioinformatic analyses of the DUF328 family have indicated a link between these proteins and nucleic acid metabolism, particularly to the metabolism of the 7-deazaguanosine–modified nucleosides (9). A DUF328 domain is found in DpdC proteins involved in the formation of 2′-deoxy-7-amido-7-deazaguanosine (dADG) from 2′-deoxy-7-cyano-7-deazaguanosine (dPreQ0) in vivo (10). The 7-deazaguanosine derivatives dADG and dPreQ0 are recently discovered DNA modifications encoded by the dpd cluster found in a diverse set of bacteria. Of the 11 genes found in the Salmonella serovar Montevideo dpd cluster (dpdAdpdK), dpdC probably encodes an enzyme that hydrolyzes dPreQ0 to dADG (10). E. coli K12 MG1655 has the queuosine-tRNA pathway but no dpd genes, suggesting that YaaAEc is likely to have a distinct function.

A further cellular connection between YaaAEc and DNA repair was reported by Liu et al. (4), who observed enhanced mutation rates in ΔyaaA E. coli and severe growth defects for ΔyaaA recA56 and ΔyaaA polA1 double mutant strains that had intact oxidative stress response enzymes when grown in an aerobic atmosphere. RecA is a ssDNA-binding protein that plays an important role in DNA recombination and repair (11). The recA56 mutant is defective in ATP-regulated formation of nucleoprotein filaments and can partially impair recombination by WT RecA (12). PolA (DNA polymerase I) is an E. coli DNA polymerase that is involved primarily in DNA repair; the polA1 mutation results in increased mutagenesis and diminished base mismatch repair (13, 14). The ΔyaaA recA56 and ΔyaaA polA1 double mutants displayed growth defects in aerobic growth conditions that otherwise required a severely impaired oxidative stress response to be observed in the ΔyaaA single mutant E. coli (4). In aggregate, these results suggest a functional link between YaaAEc, oxidative stress, and DNA repair. Despite a growing body of data about the role of YaaA and other DUF328 proteins in bacterial oxidative stress defense, little is known about any DUF328 protein at the molecular level, including their structures or functions.

Here, we provide a direct molecular connection between the DUF328 family, specifically YaaAEc, with DNA maintenance and the oxidative stress response in bacteria. Bioinformatic analysis shows a strong association between DUF328 proteins and genes involved in DNA recombination and oxidative stress response. We find that YaaAEc directly binds to DNA in vitro, and the 1.65-Å resolution X-ray crystal structure of YaaAEc shows that the protein possesses a new fold with a positively charged cleft and a helix-hairpin-helix (HhH) DNA-binding motif found in other proteins that bind DNA in a nonsequence-specific manner. As the first member of the DUF328 family to be structurally characterized, YaaAEc establishes a new protein fold family that we propose be named the cantaloupe fold. Our results show that the DUF328 family comprises DNA-binding proteins that play important roles in DNA maintenance during oxidative stress, assigning a molecular function to this family of proteins.

Results

Comparative genomic analyses show that DUF328 genes are widespread and are fused or physically clustered with genes involved in oxidative stress and DNA repair

YaaAEc is a member of the DUF328 (IPR005583/H2O2_YaaD/PF03883) family. Analysis of the taxonomic distribution of members of this family shows that it is widespread in bacteria (Fig. 1), with homologs found in ∼45.0% (10,565/23,458) of the reference organisms in the Genome Taxonomy Database (GTDB) (15). Phyla that harbor the most DUF328 family members are Campylobacterota (∼87.4%, 222/254), Actinobacteriota (∼83.9%, 2617/3118), Proteobacteria (∼59.1%, 4512/7630), and Bacteroidota (∼55.3%, 1573/2843), where the values in the parenthesis indicate the fraction of total reference organisms that contain DUF328 members. In contrast, DUF328 members are much less widespread in archaea, with homologs found in only ∼1.3% (31/2,392) of the reference organisms in the GTDB (Fig. S1).

Figure 1.

Figure 1.

Phylogenetic distribution of DUF328 proteins and representative genomes in bacteria. A, results of an AnnoTree query using Pfam family PF03883 (DUF328) in bacteria with resolution at the genus level. Branches are highlighted in blue for phyla that harbor members of the DUF328 family. The trees were generated in AnnoTree with the e-value cutoff of 0.00001 and adapted for visualization (RRID:SCR_018980). B, representative genome neighborhood diagrams of PF03883 (DUF328) family (red arrows). YaaJ: PF01235 (sodium:alanine symporter family), QueF: PF14489 (QueF-like protein), 2OG: PF13640 (2-oxoglutarate-Fe(II) oxygenase superfamily), DEAD: PF00270 (DEAD/DEAH box helicase), helicase: PF00271 (helicase conserved C-terminal domain), RecQ-like: PF16124 (RecQ zinc-binding), adh: PF00106 (short-chain dehydrogenase), AhpC: PF00578 (alkyl hydroperoxide reductase subunit C/ thiol-specific antioxidant family), Nif: PF01784 (NIF3 (NGG1p interacting factor 3)), zf: PF02591 (C4-type zinc ribbon domain).

Genes with related functions often cluster together in bacterial genomes. To capture gene neighborhood associations of the DUF328 family, we performed a sequence similarity network (SSN) analysis combined with genome neighborhood network (GNN) analysis (16, 17). Using a stringent alignment score threshold of 90, YaaAEc (UniProt ID P0A8I3) is partitioned into the largest cluster (cluster 1) that contains no sequence with an associated gene ontology term (Fig. S2), indicating that YaaAEc is not closely associated with any sequence of annotated function.

The GNN analysis of genes encoding proteins from the major DUF328/PF03883 SSN clusters shows extensive clustering with genes involved in oxidative stress response (Figs. S3S7). Of the nine extracted neighborhoods, four contain genes encoding alkyl hydroperoxide reductase/thiol-specific antioxidant proteins of the AhpC family (PF00578). Three contain genes encoding for Ruberythrin, a nonheme iron-containing metalloprotein involved in oxidative stress tolerance in anaerobic bacteria (PF02915). Two contain genes encoding oxidative dealkylation repair proteins of the AlkB family (PF13640) and members of the DUF3501 family (PF12007), previously linked to oxidative stress (18). No other functional area is as highly represented among the DUF328 GNN proteins, even though we see quite a few RNA metabolism and translation proteins (PF00270, PF04073, PF00587, PF00849, PF10150, PF14489, and PF14819).

GNN analysis links members of the DUF328 family to oxidative stress, and analysis of Rosetta stone–type protein domain fusions (19) indicates an association between DUF328 proteins and DNA maintenance and repair. Several DUF328 proteins are fused to domains from the GIY-YIG endonuclease superfamily (IPR000305) (Fig. 2). Nucleases of the GIY-YIG family are involved in DNA repair and recombination, transfer of mobile genetic elements, and restriction of incoming foreign DNA (2022). The GIY-YIG proteins are so-named because they contain a domain of typically ∼100 amino acids with two short motifs, “GIY” and “YIG”, in their N-terminal regions. Additionally, DUF328 domains are found fused to multiple recombinase domains, a Zn2+-β ribbon domain, a PadR helix-turn-helix motif, and an AlbA_2 DNA-binding domain (Fig. 2), underscoring the connection between DUF328 and DNA binding and repair. The gene fusion and neighborhood analyses corroborate the function of DUF328 proteins in oxidative stress first reported for the YaaAEc (4) but add a link to DNA repair that was not obvious from previous studies.

Figure 2.

Figure 2.

Schematics of examples of PF03883/IPR005583/DUF328 fusion proteins. Representatives were retrieved from pfam and Conserved Domain Architecture Retrieval Tool. H2O2_YaaD (DUF328): PF03883; GIY-YIG: PF01541; SpoVK: COG0464; Ser_Rec: Ser_Recombinase superfamily, cd00338; Rec: recombinase, pfam07508; Zn: recombinase zinc β ribbon domain, pfam13408; AlbA_2: putative DNA-binding domain, PF04326; HTH: helix-turn-helix domains. aa, amino acids.

DUF328 genes show strong cofitness with the RecFOR pathway for DNA repair

A measure of the joint contribution of two or more genes to organism survival is given by the cofitness, which is the Pearson correlation coefficient of their contributions to organismal fitness under different physiological conditions. Cofitness analysis derived from an extensive Tn-Seq analysis of 32 organisms in dozens of conditions (23) further supports an association between DUF328 genes, DNA recombination, and hydrogen peroxide detoxification. In the case of E. coli, the top cofitness associations are with peptidoglycan synthesis and cell division genes (alr, ftsN and amiA) with scores between 0.45 and 0.48, and then with a few DNA repair genes such as recG and uvrD (scores of 0.44 and 0.43, respectively). However, none of these scores were extremely high. The highest cofitness scores (>0.75) were found in other species, indicating that DUF328 genes are likely to function in the same pathway with the DNA replication and repair protein coding gene recF in Shewanella sp. ANA-3, with catalase/peroxidase coding genes in Acidovorax sp. GW101-3H11, and with unknown genes in Pseudomonas syringae pv. syringae B728a ΔmexB and Dechlorosoma suillum PS (Table 1). The strong conserved cofitness (cofitness >0.6 and ortholog cofitness >0.6) indicates functional relationships of DUF328 genes with genes coding DNA repair and recombination proteins, such as recA and recN (Table 1). DUF328 genes from at least seven bacteria show strong conserved cofitness (ortholog cofitness >0.6 in Table 1 and Table S1) with components of the RecFOR pathway that initiates recombination-mediated DNA repair by processing ssDNA gaps and loading RecA onto the recombinogenic ends (2426). The highest ortholog cofitness scores (>0.75) include genes encoding the DNA replication and repair protein recF in Caulobacter crescentus and Pseudomonas fluorescens FW300-N2E2 (Table S1). Several other genes in the recFOR pathway (recA, recJ, recN, recO, and recR) also display high cofitness with DUF328 genes, pointing to a strong joint contribution to organism survival (Table 1 and Table S1). The recurrent cofitness association with exonuclease I gene (scbB) further corroborates the association between DUF328 proteins and ssDNA involved in recombination. Prokaryotic exonuclease I possesses a 3′–5′ single-stranded DNA exonuclease activity that is stimulated by ssDNA-binding protein and plays an important role in DNA repair (27). Also recurring in the DUF328 cofitness analysis is catalase, which defends against H2O2 stress by converting H2O2 to H2O and O2, reinforcing the connection between DUF328 proteins and peroxide stress.

Table 1.

Genes that show strong and conserved cofitness with DUF328 genes

Data extracted from the Fitness Browser (http://fit.genomics.lbl.gov/cgi-bin/myFrontPage.cgi)

Organism YaaA homolog Cofitness hit Description Ortholog cofitness cofitness
Shewanella loihica PV-4 Shew_1093 Shew_1611 sbcB, exonuclease I 0.66 0.63
Shewanella sp. ANA-3 Shewana3_3143 Shewana3_1126 recA, recombinase A 0.63 0.72
Shewana3_1104 recombination and repair protein 0.62 0.67
Shewana3_2572 sbcB, exonuclease I 0.66 0.66
Shewana3_0011 recF, recombination protein F (RefSeq) 0.57 0.75
Shewanella oneidensis MR-1 SO3540 SO1328 transcriptional regulator, LysR family 0.68 0.65
SO3430 recA protein 0.72 0.63
Pseudomonas syringae pv. syringae B728a ΔmexB Psyr_1064 Psyr_2850Psyr_3174 hypothetical proteinuroporphyrinogen-III C-methyltransferase/precorrin-2 dehydrogenase nono 0.80.78
Dechlorosoma suillum PS Dsui_2322 Dsui_2345 hypothetical protein no 0.88
Dsui_3354 adenine specific DNA methylase Mod no 0.88
Caulobacter crescentus NA1000 CCNA_03496 CCNA_02062 DNA repair protein recN 0.67 0.62
Acidovorax sp. GW101-3H11 Ac3H11_1593 Ac3H11_784 hydrogen peroxide-inducible genes activator 0.65 0.68
Ac3H11_2454 exodeoxyribonuclease I (EC 3.1.11.1) 0.66 0.66
Ac3H11_452 polyphosphate kinase (EC 2.7.4.1) 0.49 0.8
Ac3H11_135 catalase (EC 1.11.1.6)/peroxidase (EC 1.11.1.7) no 0.79

Recombinant YaaAEc binds DNA with nanomolar affinity

Consistent with strong bioinformatic evidence connecting DUF328 family proteins to DNA maintenance and repair, YaaAEc is associated with a large amount of nucleic acid during purification of the recombinant protein from E. coli. We determined that the associated nucleic acid was DNA based on its selective degradation by DNaseI. Ultimately, nearly all of the DNA was removed from YaaAEc by hydroxyapatite chromatography (see “Experimental procedures”), but strong anion exchange chromatography with a Q resin could not separate YaaAEc from the DNA. The persistence of DNA in association with YaaAEc suggests direct and tight binding.

YaaAEc binding to DNA was measured using EMSA with a variety of defined DNA structures. YaaAEc binds to double-stranded, bulge-containing, and Holliday junction DNA with comparable dissociation constants (KD) of ∼200–300 nm (Fig. 3). YaaAEc also binds ssDNA, although with lower apparent affinity (Fig. 3A). Unlike many other DNA binding proteins, YaaAEc shows no strong preference for specific DNA structures. Multiple shifted DNA bands are observed at higher concentrations of YaaAEc in the EMSA, suggesting that these DNA constructs are capable of binding multiple copies of YaaAEc (Fig. 3, AC). The fraction of the DNA duplex with a 12-nt bubble bound by YaaAEc was well-fitted by a cooperative binding model, giving a KD = 265 nm and a Hill coefficient of ∼3.1 (Fig. 3D). All the EMSAs show evidence of positive cooperativity in YaaAEc binding, and several distinct bands are seen at higher concentrations of YaaAEc. Either multiple YaaAEc molecules can bind a single DNA molecule directly, or one YaaAEc binds to the DNA fragment and additional YaaAEc molecules interact with the bound YaaAEc in the YaaAEc-DNA complex. Our data do not discriminate between these two possibilities, although the Hill coefficient of 3.1 suggests a cooperative YaaAEc recruitment mechanism.

Figure 3.

Figure 3.

YaaAEc binds to diverse DNA constructs with nanomolar affinity. A–C, EMSA of recombinant YaaAEc with various DNA constructs illustrated below each panel. The concentration of YaaAEc is shown at the top of each panel and the fraction of the total DNA signal that is shifted from the free position is quantified beneath each lane. D, the binding of YaaAEc to the 12 nt bubble duplex DNA from EMSA is fitted to a single-site, positively cooperative binding model. Data were measured in triplicate and verified for multiple preparations of YaaAEc. The Hill coefficient was fitted as 3.1.

YaaAEc role is PreQ0-independent

DUF328 domains are present in DpdC proteins that are predicted dPreQ0 nitrile hydratases (10), and DUF328 genes physically cluster with queF genes encoding PreQ0 reductase (Fig. 1B). A parsimonious prediction is that YaaAEc may be involved in the recognition and repair of PreQ0 that has been incorporated into DNA by mistake. Indeed, tRNA guanine transglycosylase (TGT), normally involved in the synthesis of queuosine in tRNA, can also insert the PreQ0 derivative PreQ1 in DNA in vitro if the thymine base is replaced with uracil (28). The presence of uracil in DNA does occur and is usually corrected by a specific repair machinery (29). However, the addition of exogenous PreQ0 did not affect growth of the ΔyaaA mutant in LB (Fig. S8). In addition, the growth defect caused by the deletion of yaaA in the Hpx background (4) seemed to be improved by the addition of PreQ0 and exacerbated by the deletion of the queD gene involved in the PreQ0 synthesis pathway. Even if the difficulty of working with the Hpx strain makes this last result within the margin of error (Fig. S9), it does not fit with a role of YaaA in repairing potential misincorporations of PreQ0 in DNA. Additional studies in which uracil levels are increased and PreQ0 levels are measured in DNA would be needed to totally rule out this hypothesis, however.

YaaAEc possesses a new fold

We determined the X-ray crystal structure of YaaAEc to 1.65-Å resolution using single-wavelength anomalous diffraction (SAD) phasing of the selenomethionine (SeMet)-substituted protein. YaaAEc is a single-domain protein possessing a new fold comprising 12 α-helices and 14 β-strands with a core defined by a three-stranded parallel β-sheet (Fig. 4, A and B). Several of the secondary structural elements are short (e.g. αD, αG, αI, αJ, and β2, β3, β4, β8, β9) and may be differently classified by various secondary structure–detection algorithms. Overall, YaaAEc is wedge-shaped with an apical cleft, resembling a slice of cantaloupe (Fig. S10). Helices αB and αC compose a HhH DNA-binding motif (see below) that is positioned opposite a β-strand motif comprising β11–14. The β-strand motif has an unusual abundance of solvent-exposed aromatic amino acids and several lysine residues. The cleft region that lies between the HhH and β-strand motifs is ∼20 Å wide and is rich in basic residues, resulting in a highly positive electrostatic potential as calculated by the Adaptive Poisson-Boltzmann Solver (APBS) (30) (Fig. 4C). Residues in the cleft region display an approximate dyad symmetry (Fig. 4D) and are among the most highly conserved residues in the DUF328 family, including a 209KKARG213motif that binds a chloride ion from the crystallization buffer. The positive electrostatic potential of the cleft, the rough dyad symmetry of several conserved residues in the region, and a width that matches the diameter of B-form DNA make this cleft a plausible contact surface for DNA.

Figure 4.

Figure 4.

Three-dimensional structure of YaaAEc defines a new fold. A and B, two views of the ribbon diagram of YaaAEc with α-helices lettered and β-strands numbered. The views in (A) and (B) are related by the 90° rotation indicated by the arrow. YaaAEc defines a new protein fold class. C, the electrostatic potential surface for YaaAEc looking down into the apical cleft as calculated by APBS (see “Experimental procedures”). Electrostatic potential is colored from blue (+10 kT/e) to red (−10 kT/e), where T is 298 K. D, the same view into the apical cleft as (C) with key residues labeled. The green and gold residues compose two groups related by an approximate dyad axis that passes between residue pairs Glu130, Tyr175 and Lys9, Lys209.

YaaAEc has multiple extended stretches of polypeptides adopting nonstandard structures from residues 6–22, 67–78, and 123–135, interrupted by a short β-strand from residues 10–12. These distinctive and atypical polypeptide structures define a broad area on the exterior of YaaAEc that includes part of the cleft and also penetrate into the core of the protein (Fig. 5A). These regions are not loops as typically defined because they are not confined to the surface of the protein and meander through the core of the protein structure. Unlike loops, these atypically structured residues make extensive contacts with neighboring residues and are well-ordered judging from both the quality of the electron density (Fig. S11) and their low refined atomic displacement parameters (ADPs). The majority of the peptide atoms in these nonstandard secondary structural regions make hydrogen bonds with solvent or nearby amino acid sidechains rather than other peptides, again differentiating them from peptide groups in standard secondary structural elements such as α-helices and β-strands. Among these unusual contacts, a chain of Tyr-mediated hydrogen bonds extends across the conserved core of the region. This Tyr-rich hydrogen bond network includes a highly conserved SGXYG motif (112SGLYG116 in YaaAEc) that is located at a sharp turn between β-strands in the core of YaaAEc (Fig. 5B) and makes extended contacts with residues 67–72 and 124–131. The high degree of conservation (Fig. S12) and clear structural importance of the SGXYG motif indicates that the surrounding unusually structured regions are likely conserved features of the DUF328 family.

Figure 5.

Figure 5.

YaaAEc possesses long stretches of nonstandard structures extending from the core to the periphery of the protein. A, a ribbon diagram of YaaAEc with the conserved SGXYG motif in yellow and the surrounding stretches of peptides adopting structures that are neither α-helix or β-strand in magenta. B, a close-up view of this region, where the dotted lines show a network of hydrogen bonds that mediate interactions between the conserved SGXYG motif and the surrounding residues. Tyrosine residues are highly represented in this region.

YaaA contains a HhH DNA-binding motif

Although YaaAEc possesses a new fold, the Phyre2 fold recognition server (31) identifies an HhH motif from residues 35–66 that is found in several DNA-binding proteins (Fig. 6A). HhH motifs bind DNA in Hef-domain RuvA domain 2-like proteins (PDB 2AQ0, 1X2I, and 2BGW) (3234), excinuclease abc subunit c (PDB 1KFT) (35), DNA/RNA-binding 3-helical bundle, GerE-like (LuxR/UhpA family of transcriptional regulators) (PDB 1FSE) (36), DNA excision repair protein XPF-ERCC1 (PDB 6SXB) (37), and mitochondrial transcription elongation factor 2 (PDB 5OL9) (38), among others. In the structures that include bound DNA (6SXB and 2BGW), the HhH motif directly contacts the minor groove of DNA, suggesting that this is a possible DNA-binding mode for YaaAEc as well (Fig. 6B). A preference for binding the minor groove is consistent with the lack of sequence-specific DNA binding by HhH domains (39). The putative DNA-binding regions in the apical cleft of YaaAEc, including parts of the HhH motif, show the highest sequence conservation in the DUF328 family and strongly support the functional significance of these regions (Fig. S13).

Figure 6.

Figure 6.

YaaAEc has a HhH DNA-binding motif. A, a ribbon diagram of YaaAEc with the region that is structurally conserved in other DNA-binding proteins colored magenta. The classical HhH motif is defined by αB and αC (labeled). B, a superposition of YaaAEc (gray, blue) with the HhH motif (yellow) and bound DNA from XPF-ERCC1 endonuclease (PDB 6SXB). This illustrates one potential DNA-binding mode for YaaAEc and other DUF328 proteins.

Discussion

In this study we show that DUF328 proteins, including the E. coli representative YaaAEc, are DNA-binding proteins involved in the oxidative stress response. Prior work from the Imlay group (4) showed that YaaAEc is important for regulating ferrous iron (Fe2+) levels in oxidatively stressed E. coli. Fe2+ is dangerous in the presence of H2O2 because Fenton chemistry can generate the highly reactive hydroxyl radical, which is indiscriminately destructive (40). The hydroxyl radical is thought to result in bacterial death predominantly through DNA damage (41). Although our results do not explain how YaaAEc regulates Fe2+ levels, they demonstrate that a key molecular activity of YaaAEc is DNA binding, indicating that it plays an important role in DNA maintenance and repair under oxidative stress conditions. This is consistent with prior observations that yaaA deletion resulted in a mutator phenotype and a filamentous E. coli cell morphology (4), which is a common cellular manifestation of extensive DNA damage in bacteria (42). Moreover, the multiple and strong comparative genomic connections between YaaAEc homologs and DNA repair suggest that this function is likely conserved throughout the DUF328 family.

Cofitness analysis reveals a strong connection between DUF328 proteins and the RecFOR pathway of single-stranded gap DNA repair. Although this indicates a connection between DUF328 proteins and DNA repair, it does not require that YaaAEc operate in the RecFOR pathway. Cofitness indicates that DUF328 and RecFOR genes make joint contributions to organism survival, which may be because they operate in the same pathway or, alternatively, because they operate in different but functionally intersecting pathways. The other major DNA repair pathway in bacteria is the RecBCD pathway, which primarily targets and repairs double-stranded breaks in DNA (43). Although genes in this pathway did not associate with DUF328 genes in our comparative genomics study, it is possible that YaaA is involved in this or other DNA-maintenance processes that partially overlap with the RecFOR pathway (44). A circumstantial argument against DUF328 proteins being direct participants in the RecFOR pathway is that this DNA repair pathway is present in archaea but DUF328 proteins are rare in that kingdom. Moreover, defects in the RecFOR pathway in E. coli result in phenotypes that do not depend on oxidative stress, whereas a phenotype is seen in ΔyaaA E. coli only in oxidative stress conditions (4). However, the selective protective role of YaaAEc during oxidative stress may be driven by its increased expression under oxidative stress, rather than specificity for repair of oxidative DNA damage. Although the details of DUF328 proteins' role in DNA protection during oxidative stress remain to be determined, cofitness analysis indicates a probable contribution from the RecFOR pathway in diverse bacteria.

The crystal structure of YaaAEc reveals a new fold for the DUF328 family (45). Given its resemblance to a slice of cantaloupe (Fig. S10), we propose that this be called the cantaloupe fold. This fold features a distinctive abundance of structured peptide stretches that are neither α-helix nor β-strand and an apical cleft demarcated on one end by a HhH DNA-binding domain and on the other by a β-strand motif. HhH domains are nonsequence-specific DNA-binding modules that are commonly found in proteins that digest, synthesize, or repair DNA (39). The apical cleft is highly basic, enriched in conserved residues, and a plausible site for DNA binding. It is unclear if DNA could simultaneously bind both regions and whether such bound DNA would be partially unwound or otherwise structurally perturbed. The minor groove-binding preference of HhH motifs suggests that at least some portion of the bound DNA should be a B-form double helix. The presence of two candidate DNA-binding sites may explain the presence of multiple shifted bands at higher concentrations of protein in the EMSA of YaaAEc-DNA complexes (Fig. 3). These multiple bands indicate a YaaAEc:DNA binding stoichiometry of greater than 1:1 at higher YaaAEc concentrations, consistent with multiple YaaAEc proteins binding cooperatively to the DNA. The Hill coefficient of 3.1 for YaaAEc binding indicates strong positively cooperative DNA binding, suggesting either that protein-protein interactions enhance YaaAEc affinity for DNA or that binding-induced perturbations to DNA structure recruit additional YaaAEc molecules. Determining how YaaAEc interacts with DNA will be an important future direction for elucidating the molecular basis of DNA recognition by this new protein fold.

DpdC proteins are members of the DUF328 family that possess nitrile hydratase activity involved in PreQ0 metabolism (9, 10). The involvement of DpdC proteins in PreQ0-containing DNA is broadly consistent with our results connecting YaaAEc to DNA repair, although we find that YaaAEc is not involved in queuosine metabolism in E. coli. Regardless, the nitrile hydratase activity of DpdC proteins demonstrates that the DUF328 cantaloupe fold can support enzymatic activity, leaving open the possibility that YaaAEc may have an undiscovered enzymatic function. An important avenue for future research will be to determine the mechanism by which YaaAEc (and other DUF328) proteins regulate Fe2+ levels to protect cells from oxidative stressors (4), including whether YaaAEc simply binds DNA or acts on it as a substrate. This report provides a molecular activity for YaaAEc and other DUF328 proteins that will inform additional research into the protective mechanisms of this new family and fold class of DNA-binding protein.

Experimental procedures

Bioinformatics analyses

The protein sequences were retrieved from the NCBI Protein Database using the following accession numbers: YaaAEc, Uniprot ID P0A8I3; DpdC, AJQ72467.1. Protein domain analysis was performed using HHpred by the MPI Bioinformatics Toolkit (46, 47) against the Pfam database (48) using default parameters.

319 reviewed sequences of the DUF328/IPR005583/PF03883 family were retrieved from InterPro (RRID:SCR_006695). Multiple sequence alignment was built using MAFFT (RRID:SCR_011811) (49). WebLogo (50) was used to generate the sequence logo. Domain architecture was analyzed by the Conserved Domain Architecture Retrieval Tool (51).

SSNs were generated with the Enzyme Function Initiative suite of webtools (16, 17). SSNs were visualized using Cytoscape (52). The parameters used for the generation of the DUF328/IPR005583/PF03883 SSNs were as follows: for the whole family SSN, the input method “FASTA” (option C) was used, using YaaAEc and IPR005583, UniRef90 with the minimum length filter of 100 and maximum length filter of 500. The alignment score threshold was 90. Sequences that share 90% or more identity are collapsed together to a single node to reduce the complexity for visualization. The obtained SSN was subjected to Enzyme Function Initiative Genome Neighborhood Tool analysis to obtain genome neighborhood diagrams.

Fitness data were retrieved from the Fitness Browser (RRID:SCR_018981) (23, 53) using YaaAEc as the query sequence. Genes that show strong cofitness (cofitness >0.75) and strong conserved cofitness (cofitness >0.6 and ortholog cofitness >0.6) with DUF328 genes are placed in Table 1. Genes with ortholog cofitness >0.6 and cofitness × ortholog cofitness >0.3 with DUF328 genes are placed in Table S1. The AnnoTree v1.2.0 tool (54) was used to analyze the distribution of members of the PF03883 family in the reference species set from GTDB bacteria release RS89 (55).

Bacterial strains

All strains used in this study are listed in Table 2. E. coli strains were routinely grown in LB media (Tryptone 10 g/liter, yeast extract 5 g/liter, and NaCl 5 g/liter) at 37 °C. When antibiotic selection was required, media were supplemented with 100 μg/ml ampicillin, 25 μg/ml chloramphenicol, or 50 μg/ml kanamycin. PreQ0 was purchased from Ark Pharm.

Table 2.

Strains used in this study

Strain name Genotype/relevant characteristics Reference/source
E. coli MG1655 WT (Liu, Y. et al., 2011)/Imlay laboratory UIUC
ΔyaaA As MG1655 plus Δ(yaaA1::cat)1 (Liu, Y. et al., 2011)/Imlay laboratory UIUC
ΔqueD As MG1655 plus ΔqueD::cat This study
Δtgt As MG1655 plus Δtgt::cat H. Mori Collection
ΔqueD ΔyaaA As ΔyaaA plus ΔqueD::cat This study
Δtgt ΔyaaA As ΔyaaA plus Δtgt::cat This study
Hpx As MG1655 plus Δ(ahpC-ahpF′) kan::′ahpF Δ(katG17::Tn10)1 Δ(katE12::Tn10)1 (Liu, Y. et al., 2011)/Imlay laboratory UIUC
Hpx ΔyaaA As Hpx plus Δ(yaaA1::cat)1 (Liu, Y. et al., 2011)/Imlay laboratory UIUC
Hpx ΔqueD As Hpx plus ΔqueD::cat This study
Hpx ΔyaaA ΔqueD As Hpx ΔyaaA plus ΔqueD::cat This study

Growth studies were adapted from those previously described (4). Anaerobic overnight cultures were diluted into anaerobic LB and grown for four to five generations to early log phase (optical density at 600 nm (A600) of 0.15 to 0.20). The cultures were then diluted into aerobic LB of the same composition to an A600 of ≈0.005. All cultures were grown at 37 °C. LB medium was made 1 day prior to culturing and stored in the dark to avoid the photochemical generation of H2O2. It was transferred immediately after autoclaving to an anaerobic chamber (BACTRON anaerobic chamber), where it was stored under an atmosphere of 5% CO2, 10% H2, and 85% N2 overnight or longer prior to use.

Expression and purification of YaaAEc

The YaaAEc gene (yaaA:b0006) was PCR-amplified from E. coli XL1-blue genomic DNA using Platinum II Taq DNA polymerase (Thermo Fisher Scientific) and primers that introduced 5′ NdeI and 3′ XhoI restriction sites. The YaaAEc gene was subcloned between the NdeI and XhoI sites of pET15b (Novagen) and sequence-verified by dideoxy DNA sequencing. The validated YaaAEc-pET15b construct was transformed by heat shock into chemically competent BL21(DE3) (Novagen) E. coli for protein expression. This construct expresses YaaAEc with a thrombin-cleavable N-terminal hexahistidine tag, although the tag was difficult to remove from the purified protein with thrombin and was retained in the final purified protein. BL21(DE3) cells containing YaaAEc-pET15b were grown in LB medium with 100 μg/ml ampicillin at 37 °C with shaking at 270 rpm to an A600 of 0.2–0.3, at which point the culture was transferred to 20 °C and incubated with shaking at 150 rpm for an additional two h. YaaAEc expression was induced with the addition of isopropyl β-d-1-thiogalactopyranoside (Calbiochem) to a final concentration of 0.2 mm, and the culture was incubated at 20 °C with shaking overnight. Chloramphenicol was added to a final concentration of 100 μg/ml two h prior to harvest to enhance protein solubility (56). Cells were harvested by centrifugation and cell pellets were frozen on liquid N2 and stored at −80 °C.

Recombinant hexahistidine-tagged YaaAEc was purified using nickel-nitrilotriacetic acid metal affinity chromatography as previously described (57). Briefly, the cell pellet was lysed by the addition of lysozyme to a final concentration of 1 mg/ml followed by sonication. The crude lysate was clarified by centrifugation at 12,000 × g and the supernatant was incubated with HIS-select nickel-nitrilotriacetic acid resin (Sigma-Aldrich), washed with wash buffer (25 mm HEPES, pH 7.5, 300 mm NaCl, and 25 mm imidazole), and eluted with elution buffer (25 mm HEPES, pH 7.5, 300 mm NaCl, and 250 mm imidazole). Fractions containing the YaaAEc protein were determined by Coomassie-stained SDS-PAGE and the purest fractions were pooled. Despite bearing a thrombin cleavage site downstream of the N-terminal hexahistidine tag, incubation of the purified protein with thrombin did not efficiently cleave the hexahistidine tag. Therefore, the final protein retains the tag and the thrombin cleavage site, adding the N-terminal sequence MGSSHHHHHHSSGLVPRGSH before the first methionine of YaaAEc.

YaaAEc co-purified with large amounts of nucleic acid that was identified as DNA based on its sensitivity to DNaseI on GelRed (Biotium)-stained agarose gel electrophoresis. The contaminant DNA was present at ∼0.1 mg/mg of protein based on 260 nm/280 nm absorbance ratio and could not be effectively removed by passage over a High Q anion exchange column (Bio-Rad). The large majority of the DNA could be removed using hydroxyapatite chromatography (ceramic hydroxyapatite Type II resin, Bio-Rad) with gradient elution from 20 to 500 mm KPO4 buffer, pH 7.2, over 10 column volumes. A small amount of residual nucleic acid remained in YaaAEc even after hydroxyapatite chromatography as determined by GelRed-stained agarose gel electrophoresis of the purified protein. After hydroxyapatite chromatography, the final YaaAEc protein was dialyzed into storage buffer (25 mm Tris-HCl, pH 8.8, and 100 mm KCl) and was concentrated to 28 mg/ml (ε280 = 25,900 m−1 cm−1) with a 10-kDa molecular weight cutoff regenerated cellulose membrane spin concentrator at 4 °C (Millipore). Before storage, 10 mm EDTA (final concentration) was added to enhance protein stability and the protein was flash-frozen in 50–200-μl aliquots in liquid N2 and stored at −80 °C until needed.

YaaAEc DNA-binding assay

Synthetic Holliday junction (X-0), single-stranded, and double-stranded oligonucleotide substrates were generated as previously described (58, 59). DNA-binding assays were performed by incubating YaaAEc protein with 4 pg of 32P-labeled oligonucleotide substrates in DNA-binding buffer (30 mm HEPES, pH 7.5, 1 mm DTT, and 100 μg/ml BSA) on ice for 15 min. The protein-DNA complexes were analyzed on 5% native polyacrylamide gels. To compare YaaAEc binding to either circular DNA, blunt-end linear DNA, or linear DNA with single-stranded overhangs, 0.5 μg of pUC19 plasmid (Thermo Fisher Scientific, SD00661) with or without SmaI or SalI digestion was incubated with the indicated amount of YaaAEc in DNA-binding buffer on ice for 30 min prior to separation of protein-DNA complexes on 0.6% agarose gels. ImageJ (60) was used to quantify the band intensities to determine the relative amount of DNA that was free or in complex with YaaAEc. These fractional binding values were plotted as a function of total YaaAEc concentration and fit to a single-site binding model with positive cooperativity in Prism (GraphPad) to determine the KD and the Hill coefficient. DNA binding was assayed for multiple independent preparations of recombinant YaaAEc to ensure consistency.

X-ray crystallographic structure determination of YaaAEc

Crystallization conditions for purified hexahistidine-tagged YaaAEc at 28 mg/ml in storage buffer were screened using commercial sparse matrix screens in sitting-drop 96-well plates. Protein and reservoir solutions were dispensed in 0.5-µl drops using a Gryphon liquid handling robot (Art Robbins Instruments). Initial needle-shaped crystals were optimized by manual screening using sitting-drop vapor equilibration and improved crystals grew in 100 mm NaCl, 100 mm sodium citrate, pH 4.6, 100 mm Na2HPO4, 75 mm NaH2PO4, and 15% PEG 8000. Further optimization with the Hampton additive screen identified 3% benzamidine-HCl as improving crystal morphology and size. An ordered benzamidine molecule makes hydrogen bonds to two Pro73 residues related by crystallographic symmetry in the final structure. In addition, the phenyl ring of benzamidine participates in cation-π interactions with two symmetry-related Arg77 residues, explaining why this additive aided the growth of diffraction-quality crystals.

Because YaaAEc has no homologs of known structure, experimental phasing using SeMet SAD was used to calculate initial electron density maps. The pET15b-YaaAEc expression construct was transformed into the methionine auxotroph E. coli strain B834(DE3) (Novagen) by heat shock of chemically competent cells. YaaAEc was expressed in M9 minimal medium supplemented with 42 mg/liter of each l-amino acid except methionine and cysteine, 125 mg/liter each of adenine, guanosine, thymine, and uracil, 4 mg/liter thiamin, 4 mg/liter d-biotin, and 30 mg/liter l-selenomethionine (Acros Organics). SeMet-YaaAEc was purified as described above and crystallized in the same condition as the native protein. Diffraction-quality crystals of SeMet-YaaAEc were cryoprotected by serial transfer and brief soaking in reservoir solution supplemented with ethylene glycol in 5% increments to a final concentration of 15%. Crystals were mounted in nylon loops and cryocooled by rapid immersion in liquid nitrogen.

YaaAEc crystallized in space group P212121 with two protein chains in the asymmetric unit. Diffraction data extending to a 1.65-Å resolution were collected from a SeMet-YaaAEc crystal measuring 350×100×100 μm at beam line 7-1 of the Stanford Synchrotron Radiation Lightsource using the oscillation method. The incident X-rays were tuned to the K-edge of selenium (0.9788 Å) to maximize anomalous signal from the six SeMet residues in YaaAEc. Inverse beam geometry was not used. Data were processed using HKL2000 (61), and significant anomalous signal was extended to a 2.45-Å resolution using a CCanom cutoff of 0.15 (62) as determined by Aimless (63) in the CCP4 suite (64). See Table S2 for data statistics.

SAD phasing using unmerged input reflections and local scaling was performed in PHENIX (65). The figure of merit for the initial experimental SAD phases was 0.39, and these phases were improved by density modification prior to autobuilding in PHENIX (66, 67). The initial autobuilt model was manually improved in Coot (68), including addition of ordered waters. Five residues of the uncleaved N-terminal tag were ordered in the crystal and thus included in the model. Refinement was performed in PHENIX using a maximum likelihood target function based on anomalous amplitudes (i.e. Bijvoet mates were kept separate), optimization of the X-ray/stereochemical and ADP weights, and translation-libration-screw treatment of the ADPs (66). Benzamidine was modeled into unambiguous electron density and mediates a key crystal-packing contact, explaining why it was an effective crystallization additive. Final model validations were performed using Coot (68) and MolProbity (69). The final model has excellent stereochemical and clashscore statistics, with an overall MolProbity score of 0.95 (100th percentile). See Table S2 for model statistics.

YaaAEc structural analysis and display

The surface electrostatic potential of YaaAEc was calculated using the APBS (30, 70) and using PDB2PQR (71) for atomic partial charge and radius assignments. Default values of solvent (78) and protein (2) dielectric constant, probe radius (1.4 Å), and temperature (298.15 K) were used. Structural figures were made with UCSF Chimera (72) and POVScript+ (73). Sequence conservation was mapped onto the structure of YaaAEc using the ConSurf server (74).

Data availability

Data for Fig. 1, Fig. 2, and Fig. S1 are available from Dr. Valérie de Crécy-Lagard, University of Florida, vcrecy@ufl.edu. Data for Fig. 3, Table 1, Figs. S3S7, Fig. S12, and Table S1 are contained in their entirety in the article. Data for PF03883 sequence similarity network and data for growth curves are available at University of Florida Digital Collections: Fig. S2 (https://ufdcimages.uflib.ufl.edu/IR/00/01/12/21/00001/PF03883_SSN.xlsx), Fig. S8 (https://ufdcimages.uflib.ufl.edu/IR/00/01/12/21/00001/Data%20for%20FigureS8.xlsx), and Fig. S9 (https://ufdcimages.uflib.ufl.edu/IR/00/01/12/21/00001/Data%20for%20FigureS9.xlsx). X-ray crystallographic structure factor data and coordinates (Figs. 46, Fig. S10, Fig. S11, and Fig. S13) are available for download from the PDB (5CAJ). Raw diffraction data are available by request from Dr. Mark Wilson, University of Nebraska, mwilson13@unl.edu.

Supplementary Material

Supporting Information

Acknowledgments

We thank Professor James Imlay and Dr. Yuanyuan Liu (University of Illinois) for the gift of the YaaA deletion and the Hpx catalase- and peroxidase-deficient E. coli strains and Carol Cook for photography.

This article contains supporting information.

Author contributions—J. P., Y. Y., J. L., V. dC.-L., and M. A. W. data curation; J. P., Y. Y., Y. L., V. dC.-L., and M. A. W. formal analysis; J. P., Y. Y., C.-W. C., V. dC.-L., and M. A. W. validation; J. P., Y. Y., J. L., C.-W. C., D. I.-R., Y. L., V. dC.-L., and M. A. W. investigation; J. P., Y. Y., Y. L., V. dC.-L., and M. A. W. methodology; J. P., Y. Y., C.-W. C., Y. L., V. dC.-L., and M. A. W. writing-original draft; J. P., Y. Y., J. L., C.-W. C., D. I.-R., Y. L., V. dC.-L., and M. A. W. writing-review and editing; Y. Y., V. dC.-L., and M. A. W. visualization; Y. L., V. dC.-L., and M. A. W. supervision; Y. L., V. dC.-L., and M. A. W. funding acquisition; V. dC.-L. and M. A. W. conceptualization; V. dC.-L. and M. A. W. resources; V. dC.-L. and M. A. W. project administration.

Funding and additional information—Use of the Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, is supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences under Contract DE-AC02-76SF00515. The Stanford Synchrotron Radiation Lightsource Structural Molecular Biology Program is supported by the Department of Energy Office of Biological and Environmental Research and by the National Institutes of Health, NIGMS Grant P41GM103393. Portions of this research were also funded by the National Institutes of Health Grants R01GM70641 (to V. dC.-L.). and R01GM092999 (to M. A. W.). Molecular graphics and analyses were performed with UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH Grant P41GM103311. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of NIGMS or the National Institutes of Health.

Conflict of interestThe authors declare that they have no conflicts of interest with the contents of this article.

Abbreviations—The abbreviations used are:
dADG
2′-deoxy-7-amido-7-deazaguanosine
dPreQ0
2′-deoxy-7-cyano-7-deazaguanosine
HhH
helix-hairpin-helix
GTDB
Genome Taxonomy Database
SSN
sequence similarity network
GGN
genome neighborhood network
TGT
tRNA guanine transglycosylase
SAD
single-wavelength anomalous diffraction
SeMet
selenomethionine
APBS
Adaptive Poisson-Boltzmann Solver
ADP
atomic displacement parameter.

References

  • 1. Seo S. W., Kim D., Szubin R., and Palsson B. O. (2015) Genome-wide reconstruction of OxyR and SoxRS transcriptional regulatory networks under oxidative stress in Escherichia coli K-12 MG1655. Cell Rep. 12, 1289–1299 10.1016/j.celrep.2015.07.043 [DOI] [PubMed] [Google Scholar]
  • 2. Zheng M., Wang X., Templeton L. J., Smulski D. R., LaRossa R. A., and Storz G. (2001) DNA microarray-mediated transcriptional profiling of the Escherichia coli response to hydrogen peroxide. J. Bacteriol. 183, 4562–4570 10.1128/JB.183.15.4562-4570.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Dubbs J. M., and Mongkolsuk S. (2012) Peroxide-sensing transcriptional regulators in bacteria. J. Bacteriol. 194, 5495–5503 10.1128/JB.00304-12 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Liu Y., Bauer S. C., and Imlay J. A. (2011) The YaaA protein of the Escherichia coli OxyR regulon lessens hydrogen peroxide toxicity by diminishing the amount of intracellular unincorporated iron. J. Bacteriol. 193, 2186–2196 10.1128/JB.00001-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Fenton H. J. H. (1894) Oxidation of tartaric acid in presence of iron. J. Chem. Soc., Transac. 65, 899–911 10.1039/CT8946500899 [DOI] [Google Scholar]
  • 6. Sangurdekar D. P., Srienc F., and Khodursky A. B. (2006) A classification based framework for quantitative description of large-scale microarray data. Genome Biol. 7, R32 10.1186/gb-2006-7-4-r32 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Dong T., Kirchhof M. G., and Schellhorn H. E. (2008) RpoS regulation of gene expression during exponential growth of Escherichia coli K12. Mol. Genet. Genomics 279, 267–277 10.1007/s00438-007-0311-4 [DOI] [PubMed] [Google Scholar]
  • 8. Paczosa M. K., Silver R. J., McCabe A. L., Tai A. K., McLeish C. H., Lazinski D. W., and Mecsas J. (2020) Transposon mutagenesis screen of Klebsiella pneumoniae identifies multiple genes important for resisting antimicrobial activities of neutrophils in mice. Infect. Immun. 88, 10.1128/IAI.00034-20 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Thiaville J. J., Kellner S. M., Yuan Y., Hutinet G., Thiaville P. C., Jumpathong W., Mohapatra S., Brochier-Armanet C., Letarov A. V., Hillebrand R., Malik C. K., Rizzo C. J., Dedon P. C., and de Crécy-Lagard V. (2016) Novel genomic island modifies DNA with 7-deazaguanine derivatives. Proc. Natl. Acad. Sci. U. S. A. 113, E1452–E1459 10.1073/pnas.1518570113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Yuan Y., Hutinet G., Valera J. G., Hu J., Hillebrand R., Gustafson A., Iwata-Reuyl D., Dedon P. C., and de Crécy-Lagard V. (2018) Identification of the minimal bacterial 2′-deoxy-7-amido-7-deazaguanine synthesis machinery. Mol. Microbiol. 110, 469–483 10.1111/mmi.14113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Cox M. M. (1999) Recombinational DNA repair in bacteria and the RecA protein. Prog. Nucleic Acid Res. Mol. Biol. 63, 311–366 10.1016/s0079-6603(08)60726-6 [DOI] [PubMed] [Google Scholar]
  • 12. Lauder S. D., and Kowalczykowski S. C. (1993) Negative co-dominant inhibition of recA protein function. Biochemical properties of the recA1, recA13 and recA56 proteins and the effect of recA56 protein on the activities of the wild-type recA protein function in vitro. J. Mol. Biol. 234, 72–86 10.1006/jmbi.1993.1564 [DOI] [PubMed] [Google Scholar]
  • 13. Gross J., and Gross M. (1969) Genetic analysis of an E. coli strain with a mutation affecting DNA polymerase. Nature 224, 1166–1168 10.1038/2241166a0 [DOI] [PubMed] [Google Scholar]
  • 14. Fix D. F., Burns P. A., and Glickman B. W. (1987) DNA sequence analysis of spontaneous mutation in a PolA1 strain of Escherichia coli indicates sequence-specific effects. Mol. Gen. Genet. 207, 267–272 10.1007/BF00331588 [DOI] [PubMed] [Google Scholar]
  • 15. Oren A., da Costa M. S., Garrity G. M., Rainey F. A., Rosselló-Móra R., Schink B., Sutcliffe I., Trujillo M. E., and Whitman W. B. (2015) Proposal to include the rank of phylum in the International Code of Nomenclature of Prokaryotes. Int. J. Syst. Evol. Microbiol. 65, 4284–4287 10.1099/ijsem.0.000664 [DOI] [PubMed] [Google Scholar]
  • 16. Gerlt J. A., Bouvier J. T., Davidson D. B., Imker H. J., Sadkhin B., Slater D. R., and Whalen K. L. (2015) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): a web tool for generating protein sequence similarity networks. Biochim. Biophys. Acta 1854, 1019–1037 10.1016/j.bbapap.2015.04.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Zallot R., Oberg N. O., and Gerlt J. A. (2018) 'Democratized' genomic enzymology web tools for functional assignment. Curr. Opin. Chem. Biol. 47, 77–85 10.1016/j.cbpa.2018.09.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Cardenas J. P., Quatrini R., and Holmes D. S. (2016) Aerobic lineage of the oxidative stress response protein rubrerythrin emerged in an ancient microaerobic, (hyper)thermophilic environment. Front. Microbiol. 7, 1822 10.3389/fmicb.2016.01822 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Date S. V. (2008) The Rosetta stone method. Methods Mol. Biol. 453, 169–180 10.1007/978-1-60327-429-6_7 [DOI] [PubMed] [Google Scholar]
  • 20. Kowalski J. C., Belfort M., Stapleton M. A., Holpert M., Dansereau J. T., Pietrokovski S., Baxter S. M., and Derbyshire V. (1999) Configuration of the catalytic GIY-YIG domain of intron endonuclease I-TevI: coincidence of computational and molecular findings. Nucleic Acids Res. 27, 2115–2125 10.1093/nar/27.10.2115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Truglio J. J., Rhau B., Croteau D. L., Wang L., Skorvaga M., Karakas E., DellaVecchia M. J., Wang H., Van Houten B., and Kisker C. (2005) Structural insights into the first incision reaction during nucleotide excision repair. EMBO J. 24, 885–894 10.1038/sj.emboj.7600568 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Van Roey P., Meehan L., Kowalski J. C., Belfort M., and Derbyshire V. (2002) Catalytic domain structure and hypothesis for function of GIY-YIG intron endonuclease I-TevI. Nat. Struct. Biol. 9, 806–811 10.1038/nsb853 [DOI] [PubMed] [Google Scholar]
  • 23. Price M. N., Wetmore K. M., Waters R. J., Callaghan M., Ray J., Liu H., Kuehl J. V., Melnyk R. A., Lamson J. S., Suh Y., Carlson H. K., Esquivel Z., Sadeeshkumar H., Chakraborty R., Zane G. M., et al. (2018) Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557, 503–509 10.1038/s41586-018-0124-0 [DOI] [PubMed] [Google Scholar]
  • 24. Horii Z., and Clark A. J. (1973) Genetic analysis of the recF pathway to genetic recombination in Escherichia coli K12: isolation and characterization of mutants. J. Mol. Biol. 80, 327–344 10.1016/0022-2836(73)90176-9 [DOI] [PubMed] [Google Scholar]
  • 25. Kuzminov A. (1999) Recombinational repair of DNA damage in Escherichia coli and bacteriophage lambda. Microbiol. Mol. Biol. Rev. 63, 751–813 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Lloyd R. G., and Thomas A. (1983) On the nature of the RecBC and RecF pathways of conjugal recombination in Escherichia coli. Mol. Gen. Genet. 190, 156–161 10.1007/BF00330339 [DOI] [PubMed] [Google Scholar]
  • 27. Lovett S. T. (2011) The DNA exonucleases of Escherichia coli. EcoSal Plus 4, 10.1128/ecosalplus.4.4.7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Nonekowski S. T., Kung F. L., and Garcia G. A. (2002) The Escherichia coli tRNA-guanine transglycosylase can recognize and modify DNA. J. Biol. Chem. 277, 7178–7182 10.1074/jbc.M111077200 [DOI] [PubMed] [Google Scholar]
  • 29. el-Hajj H. H., Zhang H., and Weiss B. (1988) Lethality of a dut (deoxyuridine triphosphatase) mutation in Escherichia coli. J. Bacteriol. 170, 1069–1075 10.1128/jb.170.3.1069-1075.1988 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Jurrus E., Engel D., Star K., Monson K., Brandi J., Felberg L. E., Brookes D. H., Wilson L., Chen J., Liles K., Chun M., Li P., Gohara D. W., Dolinsky T., Konecny R., et al. (2018) Improvements to the APBS biomolecular solvation software suite. Protein Sci. 27, 112–128 10.1002/pro.3280 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Kelley L. A., Mezulis S., Yates C. M., Wass M. N., and Sternberg M. J. (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 10.1038/nprot.2015.053 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Das D., Tripsianes K., Jaspers N. G., Hoeijmakers J. H., Kaptein R., Boelens R., and Folkers G. E. (2008) The HhH domain of the human DNA repair protein XPF forms stable homodimers. Proteins 70, 1551–1563 10.1002/prot.21635 [DOI] [PubMed] [Google Scholar]
  • 33. Nishino T., Komori K., Ishino Y., and Morikawa K. (2005) Structural and functional analyses of an archaeal XPF/Rad1/Mus81 nuclease: asymmetric DNA binding and cleavage mechanisms. Structure 13, 1183–1192 10.1016/j.str.2005.04.024 [DOI] [PubMed] [Google Scholar]
  • 34. Newman M., Murray-Rust J., Lally J., Rudolf J., Fadden A., Knowles P. P., White M. F., and McDonald N. Q. (2005) Structure of an XPF endonuclease with and without DNA suggests a model for substrate recognition. EMBO J. 24, 895–905 10.1038/sj.emboj.7600581 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Singh S., Folkers G. E., Bonvin A. M., Boelens R., Wechselberger R., Niztayev A., and Kaptein R. (2002) Solution structure and DNA-binding properties of the C-terminal domain of UvrC from E.coli. EMBO J. 21, 6257–6266 10.1093/emboj/cdf627 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Ducros V. M., Lewis R. J., Verma C. S., Dodson E. J., Leonard G., Turkenburg J. P., Murshudov G. N., Wilkinson A. J., and Brannigan J. A. (2001) Crystal structure of GerE, the ultimate transcriptional regulator of spore formation in Bacillus subtilis. J. Mol. Biol. 306, 759–771 10.1006/jmbi.2001.4443 [DOI] [PubMed] [Google Scholar]
  • 37. Jones M., Beuron F., Borg A., Nans A., Earl C. P., Briggs D. C., Snijders A. P., Bowles M., Morris E. P., Linch M., and McDonald N. Q. (2020) Cryo-EM structures of the XPF-ERCC1 endonuclease reveal how DNA-junction engagement disrupts an auto-inhibited conformation. Nat. Commun. 11, 1120 10.1038/s41467-020-14856-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Hillen H. S., Parshin A. V., Agaronyan K., Morozov Y. I., Graber J. J., Chernev A., Schwinghammer K., Urlaub H., Anikin M., Cramer P., and Temiakov D. (2017) Mechanism of transcription anti-termination in human mitochondria. Cell 171, 1082–1093.e1013 10.1016/j.cell.2017.09.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Doherty A. J., Serpell L. C., and Ponting C. P. (1996) The helix-hairpin-helix DNA-binding motif: a structural basis for non-sequence-specific recognition of DNA. Nucleic Acids Res. 24, 2488–2497 10.1093/nar/24.13.2488 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Winterbourn C. C. (1995) Toxicity of iron and hydrogen peroxide: the Fenton reaction. Toxicol. Lett. 82–83, 969–974 10.1016/0378-4274(95)03532-X [DOI] [PubMed] [Google Scholar]
  • 41. Imlay J. A. (2003) Pathways of oxidative damage. Annu. Rev. Microbiol. 57, 395–418 10.1146/annurev.micro.57.030502.090938 [DOI] [PubMed] [Google Scholar]
  • 42. Hill T. M., Sharma B., Valjavec-Gratian M., and Smith J. (1997) sfi-independent filamentation in Escherichia coli is lexA dependent and requires DNA damage for induction. J. Bacteriol. 179, 1931–1939 10.1128/jb.179.6.1931-1939.1997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Dillingham M. S., and Kowalczykowski S. C. (2008) RecBCD enzyme and the repair of double-stranded DNA breaks. Microbiol. Mol. Biol. Rev. 72, 642–671 10.1128/MMBR.00020-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Pagès V. (2016) Single-strand gap repair involves both RecF and RecBCD pathways. Curr. Genet. 62, 519–521 10.1007/s00294-016-0575-5 [DOI] [PubMed] [Google Scholar]
  • 45. Kryshtafovych A., Moult J., Baslé A., Burgin A., Craig T. K., Edwards R. A., Fass D., Hartmann M. D., Korycinski M., Lewis R. J., Lorimer D., Lupas A. N., Newman J., Peat T. S., Piepenbrink K. H., et al. (2016) Some of the most interesting CASP11 targets through the eyes of their authors. Proteins 84, 34–50 10.1002/prot.24942 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Zimmermann L., Stephens A., Nam S. Z., Rau D., Kübler J., Lozajic M., Gabler F., Söding J., Lupas A. N., and Alva V. (2018) A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core. J. Mol. Biol. 430, 2237–2243 10.1016/j.jmb.2017.12.007 [DOI] [PubMed] [Google Scholar]
  • 47. Soding J., Biegert A., and Lupas A. N. (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33, W244–W248 10.1093/nar/gki408 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. El-Gebali S., Mistry J., Bateman A., Eddy S. R., Luciani A., Potter S. C., Qureshi M., Richardson L. J., Salazar G. A., Smart A., Sonnhammer E. L. L., Hirsh L., Paladin L., Piovesan D., Tosatto S. C. E., et al. (2019) The Pfam protein families database in 2019. Nucleic Acids Res. 47, D427–D432 10.1093/nar/gky995 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Katoh K., Rozewicki J., and Yamada K. D. (2019) MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 20, 1160–1166 10.1093/bib/bbx108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Crooks G. E., Hon G., Chandonia J. M., and Brenner S. E. (2004) WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 10.1101/gr.849004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Geer L. Y., Domrachev M., Lipman D. J., and Bryant S. H. (2002) CDART: protein homology by domain architecture. Genome Res. 12, 1619–1623 10.1101/gr.278202 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., Amin N., Schwikowski B., and Ideker T. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Wetmore K. M., Price M. N., Waters R. J., Lamson J. S., He J., Hoover C. A., Blow M. J., Bristow J., Butland G., Arkin A. P., and Deutschbauer A. (2015) Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. mBio 6, e00306–e00315 10.1128/mBio.00306-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. Mendler K., Chen H., Parks D. H., Lobb B., Hug L. A., and Doxey A. C. (2019) AnnoTree: visualization and exploration of a functionally annotated microbial tree of life. Nucleic Acids Res. 47, 4442–4448 10.1093/nar/gkz246 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Parks D. H., Chuvochina M., Chaumeil P.-A., Rinke C., Mussig A. J., and Hugenholtz P. (2019) Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy. BioRxiv 10.1101/771964 [DOI]
  • 56. Carrió M. M., and Villaverde A. (2001) Protein aggregation as bacterial inclusion bodies is reversible. FEBS Lett. 489, 29–33 10.1016/S0014-5793(01)02073-7 [DOI] [PubMed] [Google Scholar]
  • 57. Lakshminarasimhan M., Maldonado M. T., Zhou W., Fink A. L., and Wilson M. A. (2008) Structural impact of three Parkinsonism-associated missense mutations on human DJ-1. Biochemistry 47, 1381–1392 10.1021/bi701189c [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Liu Y., Masson J. Y., Shah R., O'Regan P., and West S. C. (2004) RAD51C is required for Holliday junction processing in mammalian cells. Science 303, 243–246 10.1126/science.1093037 [DOI] [PubMed] [Google Scholar]
  • 59. Xu X., and Liu Y. (2009) Dual DNA unwinding activities of the Rothmund-Thomson syndrome protein, RECQ4. EMBO J. 28, 568–577 10.1038/emboj.2009.13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Schneider C. A., Rasband W. S., and Eliceiri K. W. (2012) NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 10.1038/nmeth.2089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61. Otwinowski Z., and Minor W. (1997) Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326 10.1016/S0076-6879(97)76066-X [DOI] [PubMed] [Google Scholar]
  • 62. Karplus P. A., and Diederichs K. (2012) Linking crystallographic model and data quality. Science 336, 1030–1033 10.1126/science.1218231 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Evans P. R., and Murshudov G. N. (2013) How good are my data and what is the resolution? Acta Crystallogr. D Biol. Crystallogr. 69, 1204–1214 10.1107/S0907444913000061 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Winn M. D., Ballard C. C., Cowtan K. D., Dodson E. J., Emsley P., Evans P. R., Keegan R. M., Krissinel E. B., Leslie A. G., McCoy A., McNicholas S. J., Murshudov G. N., Pannu N. S., Potterton E. A., Powell H. R., et al. (2011) Overview of the CCP4 suite and current developments. Acta Crystallogr. D Biol. Crystallogr. 67, 235–242 10.1107/S0907444910045749 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Wang J. W., Chen J. R., Gu Y. X., Zheng C. D., Jiang F., Fan H. F., Terwilliger T. C., and Hao Q. (2004) SAD phasing by combination of direct methods with the SOLVE/RESOLVE procedure. Acta Crystallogr. D Biol. Crystallogr. 60, 1244–1253 10.1107/S0907444904010674 [DOI] [PubMed] [Google Scholar]
  • 66. Liebschner D., Afonine P. V., Baker M. L., Bunkóczi G., Chen V. B., Croll T. I., Hintze B., Hung L. W., Jain S., McCoy A. J., Moriarty N. W., Oeffner R. D., Poon B. K., Prisant M. G., Read R. J., et al. (2019) Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr. D Struct. Biol. 75, 861–877 10.1107/S2059798319011471 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. Terwilliger T. C., Grosse-Kunstleve R. W., Afonine P. V., Moriarty N. W., Zwart P. H., Hung L. W., Read R. J., and Adams P. D. (2008) Iterative model building, structure refinement and density modification with the PHENIX AutoBuild wizard. Acta Crystallogr. D Biol. Crystallogr. 64, 61–69 10.1107/S090744490705024X [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Emsley P., Lohkamp B., Scott W. G., and Cowtan K. (2010) Features and development of Coot. Acta Crystallogr. D Biol. Crystallogr. 66, 486–501 10.1107/S0907444910007493 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Williams C. J., Headd J. J., Moriarty N. W., Prisant M. G., Videau L. L., Deis L. N., Verma V., Keedy D. A., Hintze B. J., Chen V. B., Jain S., Lewis S. M., Arendall W. B. 3rd, Snoeyink J., Adams P. D., et al. (2018) MolProbity: more and better reference data for improved all-atom structure validation. Protein Sci. 27, 293–315 10.1002/pro.3330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Baker N. A., Sept D., Joseph S., Holst M. J., and McCammon J. A. (2001) Electrostatics of nanosystems: application to microtubules and the ribosome. Proc. Natl. Acad. Sci. U. S. A. 98, 10037–10041 10.1073/pnas.181342398 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Dolinsky T. J., Nielsen J. E., McCammon J. A., and Baker N. A. (2004) PDB2PQR: an automated pipeline for the setup of Poisson-Boltzmann electrostatics calculations. Nucleic Acids Res. 32, W665–W667 10.1093/nar/gkh381 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Pettersen E. F., Goddard T. D., Huang C. C., Couch G. S., Greenblatt D. M., Meng E. C., and Ferrin T. E. (2004) UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 10.1002/jcc.20084 [DOI] [PubMed] [Google Scholar]
  • 73. Fenn T. D., Ringe D., and Petsko G. A. (2003) POVScript +: a program for model and data visualization using persistence of vision ray-tracing. J. Appl. Crystallogr. 36, 944–947 10.1107/S0021889803006721 [DOI] [Google Scholar]
  • 74. Ashkenazy H., Abadi S., Martz E., Chay O., Mayrose I., Pupko T., and Ben-Tal N. (2016) ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, W344–W350 10.1093/nar/gkw408 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information

Data Availability Statement

Data for Fig. 1, Fig. 2, and Fig. S1 are available from Dr. Valérie de Crécy-Lagard, University of Florida, vcrecy@ufl.edu. Data for Fig. 3, Table 1, Figs. S3S7, Fig. S12, and Table S1 are contained in their entirety in the article. Data for PF03883 sequence similarity network and data for growth curves are available at University of Florida Digital Collections: Fig. S2 (https://ufdcimages.uflib.ufl.edu/IR/00/01/12/21/00001/PF03883_SSN.xlsx), Fig. S8 (https://ufdcimages.uflib.ufl.edu/IR/00/01/12/21/00001/Data%20for%20FigureS8.xlsx), and Fig. S9 (https://ufdcimages.uflib.ufl.edu/IR/00/01/12/21/00001/Data%20for%20FigureS9.xlsx). X-ray crystallographic structure factor data and coordinates (Figs. 46, Fig. S10, Fig. S11, and Fig. S13) are available for download from the PDB (5CAJ). Raw diffraction data are available by request from Dr. Mark Wilson, University of Nebraska, mwilson13@unl.edu.


Articles from The Journal of Biological Chemistry are provided here courtesy of American Society for Biochemistry and Molecular Biology

RESOURCES