Abstract
Colibactin is a secondary metabolite produced by certain strains of bacteria found in the human gut. The presence of colibactin-producing bacteria has been linked to colorectal cancer in humans. Colibactin was first discovered in 2006, but because it is produced in small quantities and is unstable, it has yet to be isolated from bacterial cultures. Here we summarize advances in the field since ~2017 that have led to the identification of the structure of colibactin as a heterodimer containing two DNA-reactive electrophilic cyclopropane residues. Colibactin has been shown to form interstrand cross-links by alkylation of adenine residues on opposing strands of DNA. The structure of colibactin contains two thiazole rings separated by a two-carbon linker that is thought to exist as an α-aminoketone following completion of the biosynthetic pathway. Synthetic studies have now established that this α-aminoketone is unstable toward aerobic oxidation; the resulting oxidation products are in turn unstable toward nucleophilic cleavage under mild conditions. These data provide a simple molecular-level explanation for colibactin’s instability and potentially also explain the observation that cell-to-cell contact is required for genotoxic effects.
Graphical Abstract.

Our bodies play host to microbial residents that exert a broad range of effects on our health and disease states. Within the gut, certain strains of E. coli and other Enterobacteriaceae possess the clb or pks biosynthetic gene cluster (BGC) and produce the genotoxin colibactin.1,2 Bacteria that produce colibactin induce DNA damage in eukaryotic cells and tumor formation in mouse models of intestinal inflammation.2–6 Additionally, colibactin-producing bacteria have been correlated to the clinical progression of colorectal cancer (CRC) in humans and they are known to cause oncogenic driver mutations found in human CRC samples.3,6–8 Thus, there is strong data supporting a role for colibactin genotoxicity in human CRC progression.
The data implicating colibactin in human colorectal tumorigenesis have motivated extensive structural and pharmacological studies of colibactin and other clb metabolites. Because colibactin has, to our knowledge, eluded direct isolation from the producing organism, insights into its structure and mechanism of action have been derived from an array of allied approaches. These include biosynthetic and genetic studies, enzymology, large-scale fermentations and isolations, chemical synthesis, stabilization of colibactin through DNA alkylation events, and mass spectrometry in conjunction with stable isotope labeling. These diverse lines of inquiry constitute a global campaign that has led to a general understanding of nearly all the steps in colibactin’s biosynthesis, provided a deeper understanding of the biological phenotypes induced by clb+ bacteria, and very recently led to the identification of colibactin itself. Herein we report a detailed summary of the advances within the colibactin field over the past three years. Interested readers are directed to earlier reviews for advances prior to ca. 2017.9,10
Summary of knowledge:
Researchers have been unable to directly isolate colibactin from wild-type bacteria possessing an intact clb pathway. Consequently, the field has focused on large-scale fermentations of genetically-modified strains. The underlying hypothesis has been that deactivation of individual enzymes in the BGC will promote the accumulation of incomplete clb metabolites that may be amenable to isolation. Characterization of these structures could, in turn, provide insight into the structure of colibactin and the biosynthetic function(s) of the modified enzymes. Colibactin peptidase (ClbP), a pathway-dedicated serine protease that removes a terminal N-myristoyl-D-asparagine residue from clb products, has been a frequent target of deactivation. clbP mutant strains have provided numerous “precolibactin” metabolites (1–9), which bear an intact N-myristoyl-D-asparagine (Figure 1). However, with the exception of precolibactin 886 (8) and precolibactin 969 (9), none of these isolates were found to be cytotoxic. Precolibactin 886 (8) possessed IC50s of 22.3 and 34.0 μM against HCT-116 and HeLa cells, respectively. The potency of precolibactin 969 (9) is comparable and is discussed below.
Figure 1.

Structures of predicted, isolated, and synthesized precolibactins. The isolation of metabolite 1 was guided by comparative metabolomics and targeted structural network analyses.11 Metabolite 2 was isolated independently by the Crawford, Müller, and Balskus laboratories (listed in order of submission date). These reports appeared between January and May of 2015.12–14 Metabolite 3 and precolibactin B (4) were both isolated by Qian and co-workers and disclosed in June of 2015.15 Metabolite precolibactin A (5) was deduced by chemical synthesis in 2016.12,16 Precolibactin C (6) was predicted by Qian and co-workers in 2015 and isolated by Balskus and co-workers in 2016.15,17 Precolibactin 795a (7) was isolated by Qian, Zhang, and co-workers in 2019.18 Precolibactin 886 (8) was isolated by Qian, Moore, and co-workers in 2016. Its structure was confirmed by total synthesis by Herzon and co-workers in 2019.19,20 Precolibactin 969 (9) was isolated by Qian, Zhang, and co-workers in 2019.18
Mutation of clbP was thought to simply prevent N-deacylation and thereby promote the accumulation of stable precolibactins. Through chemical synthesis, however, it was shown that this genetic modification had a greater effect on metabolite structure than anticipated. Biosynthetic studies indicated that linear intermediates such as 10 (Figure 2) are off-loaded from the assembly line.21 Accordingly, the linear intermediate 10 possessing the natural N-myrisotyl-D-Asn residue and an analog bearing a removable tert-butoxycarbonyl substituent (not shown) were synthesized. The biosynthetic intermediate 10 was found to undergo double dehydrative cyclization to precolibactin B (4) under mildly basic conditions. However, removal of the tert-butoxycarbonyl substituent prior to double cyclodehydration led to the formation of the vicinal 3,4-dihydropyrrole 4-azaspiro[2.4]hept-6-en-5-one 11 (red in 11), a scaffold that had been proposed and detected by Crawford,12 but never before isolated. This residue possesses an electrophilic cyclopropane that is capable of alkylating DNA by ring-opening addition, a mechanism of genotoxicity found in other classes of cyclopropane-containing genotoxins, such as the illudins or duocarmycins.22 Thus, a model to explain clb genoxoticity was developed wherein ClbP-deacylation promotes a cyclization pathway that leads to a DNA-reactive intermediate.12,14,23 This mechanistic model contextualizes ClbP function in the biosynthesis of colibactin24 and rationalizes why inactivation of ClbP results in loss of genotoxicity in clb+ E. coli.2 Furthermore, structures such as 11 were found to be substrates of the cyclopropane hydrolase ClbS, an enzyme that protects clb+ bacteria from autotoxicity,25,26 further supporting the relevance of the electrophilic cyclopropane to genotoxicity. This model also establishes that isolated precolibactins are biosynthetic derailment products deriving from the persistence of the N-myrisotyl-D-Asn residue.
Figure 2.

A. In the absence of functional ClbP, linear biosynthetic products such as 10 cyclize to non-genotoxic pyridone-containing structures, such as precolibactin B (4). B. Synthetic studies that model the N-myristoyl-D-asparagine deacylation event in wild-type bacteria revealed a cyclization cascade that creates the DNA-reactive electrophile 11. The electrophile 11 alkylates DNA via addition of adenine to the cyclopropane.27 The site of alkylation (N3) was determined by NMR analysis.28
Colibactin induces DNA interstrand cross-links:
Building upon the existing proposal12,14,23 that colibactin was capable of covalently alkylating DNA, in 2018 Nougayrède and co-workers sought to define the precise mode of DNA damage.29 As HeLa cells infected with clb+ E. coli suffered DNA damage,2 it was reasoned that the cells could be protected from colibactin by adding extracellular DNA to the medium, effectively using this DNA as a “colibactin sponge”. Consistent with this, a dose-dependent decrease in DNA damage was observed as increasing amounts of exogenous DNA were added to co-cultures of HeLa cells and clb+ E. coli (as determined by monitoring phosphorylation of the DNA damage marker γH2AX). Isolation of the extracellular DNA and analysis by denaturing gel electrophoresis led to the discovery that interstrand cross-links (ICLs) had formed in the DNA. This raised the question as to whether the cross-linking agent was a shunt metabolite or colibactin itself.
To answer this question, the researchers systemically mutated key proteins in the clb gene cluster and evaluated the effects of these mutations on ICL formation. Mutation of any of the genes abolished ICL formation. The genes examined included clbA (nonribosomal peptide synthetase (NPRS) and polyketide synthase (PKS) post-translational activation, clbI (cyclopropane incorporation), clbQ (thioesterase-mediated offloading of biosynthetic intermediates), and clbP (peptidase mediated deacylation of precolibactins to colibactins, vide supra). These mutation experiments, along with Oswald’s earlier determination that all biosynthetic enzymes in the gene cluster are responsible for genotoxic effects,2 implicate the full product of the clb BGC – colibactin – as the ICL-forming agent. These studies also identify biosynthetic intermediates as incapable of inducing DNA cross-links;29 however, clb+ E. coli mutated in amidase clbL retained exogenous DNA damage activity but lacked crosslinking activity and genotoxicity in cells.30 Finally, ICL formation was blocked by adding ClbS to the DNA–clb+ E. coli co-cultures. Thus, ICL abrogation by the addition of ClbS demonstrated that ICLs were induced by the colibactin pharmacophore.
To probe for ICL formation in tissue culture, the researchers evaluated the activation of ICL repair following infection with clb+ E. coli. ICLs are resolved by the Fanconi anemia (FA) pathway during S phase.31 Repair is initiated by the convergence of two replication forks on an ICL, stalling, replisome disassembly, and activation of ataxia-telangiectasia and Rad3-related (ATR) kinase. ATR activation was observed in HeLa cells incubated with clb+ E. coli. Colibactin-induced activation of ATR signaling was further confirmed by the phosphorylation of both Chk1 and replication protein A-32 (RPA). Furthermore, it was determined that the cell cycle was stalled prior to mitosis. Treatment of infected HeLa cells with the ATR inhibitor VE-82132,33 rescued cell cycle progression; however, cell viability decreased and anaphase was never reached. These results suggest that colibactin induces an ATR-dependent ICL repair response, cell cycle arrest prior to mitosis, and, in the absence of ICL repair, cell death via mitotic catastrophe.
ICLs prevent the unraveling of DNA and thus block replication fork advancement. This obstruction activates the Fanconi anemia protein D2 (FANCD2), which then binds to the stalled replication fork alongside γH2AX prior to the excision the cross-link by other nuclear repair proteins. To determine whether clb+ E. coli induce lesions that required the ICL repair machinery, the authors probed for activation of FANCD2 and its sub-nuclear colocalization with γH2AX by immunohistochemical imaging. HeLa cells incubated with clb+ E. coli exhibited a significant increase in FANCD2/γH2AX foci, suggesting activation of ICL repair, as well as increased recruitment of 53BP1. Treatment with an ATR inhibitor resulted in loss of FANCD2/γH2AX activation, thereby linking this response to ATR activation.
If colibactin was indeed inducing ICLs in HeLa cells, one would expected that cells deficient in ICL repair would be sensitized to the producing bacteria. To test this hypothesis, the researchers depleted FANCD2 using small interfering RNA (siRNA) and infected these cells with clb+ E. coli. Indeed, FANCD2-deficient HeLa cells formed fewer colonies in a clonogenic survival assay, consistent with the formation of DNA lesions that required an ICL repair mechanism to maintain cell viability. Analysis of genomic DNA by denaturing electrophoresis indicated that the DNA had accumulated ICLs. The addition of ClbS to the co-cultures rescued the DNA damage phenotype, indicating that ICL formation derived from colibactin.
These studies provided “the first report of a direct induction of DNA ICLs in the mammalian genome by a bacterial infection”. Additionally, and germane to the research below, this study established that: 1. Colibactin cross-links exogenous DNA; 2. The addition of extracellular DNA or ClbS to HeLa cells infected with clb+ E. coli decreases cellular genotoxicity; 3. Clb+ E. coli induce activation of ICL signaling and repair pathways, a response similar to other known ICL inducing agents, such as cisplatin; 4. Removal or inhibition of these ICL repair pathways sensitizes cells to clb+ E. coli; 5. The biosynthesis of the genotoxin underlying these observations is dependent upon all of the biosynthetic enzymes in the clb BGC (stated differently, the underlying genotoxin is colibactin itself, rather than a shunt metabolite).
Detection and characterization of covalent DNA adducts:
The report outlined above established that wild type clb+ E. coli induced DNA cross-links and that this DNA damage was the dominant phenotype driving genotoxicity in eukaryotic cells. The mutagenesis experiments in those studies revealed that these DNA cross-links were formed by, and therefore contained, the mature colibactin that had resisted discovery since 2006. It stands to reason then that analysis of these DNA cross-links might reveal the structure of colibactin. This strategy induced a shift in the way clb metabolites were pursued, from large-scale fermentations of genetically-modified strains to analysis of colibactin–DNA adducts. The resulting research led to key findings from multiple groups that, in aggregate, filled in the missing pieces of colibactin’s structure and mode of action.28,34
One strategy employed isotopic labeling of colibactin, trapping of colibactin and its isotopologs by DNA cross-linking, and then high-resolution LC-MS/MS analysis of the resulting DNA–colibactin adducts.34,35 To achieve this, ICLs were generated by co-incubation of clb+ E. coli with exogenous DNA. The isotopologs of colibactin containing various numbers of carbon-13 and nitrogen-15 were produced by culturing auxotrophic strains deficient in the biosynthesis of key amino acid residues in the media enriched with corresponding heavy amino acid (partial labeling), or culturing the BW25113 E. coli in M9-minimum media supplemented with no amino acid or D-glucose or 15N ammonium chloride (universal labeling). High resolution LC-MS/MS analysis of the resulting isotopologs provided unique mass shifts that identified which signals contained clb metabolites and how many of the labeled amino acids were incorporated into the corresponding structure.
Using this approach, two groups worked independently to characterize the DNA ICLs formed by clb+ E. coli. The Herzon and Crawford laboratories first reported the discovery of a colibactin–adenine adduct (Figure 3A). The structure of this adduct was proposed as 13 based on tandem MS analysis of the isotopologs of 13, as well as the related ions 14–16 (Figure 3B). Product 13 derives from opening of the cyclopropane by adenine, followed by aerobic oxidation of the ring-opened product. Prior studies of ClbS established that the products deriving from cyclopropane opening were susceptible to aerobic oxidation.26 Because the structure was characterized by MS, the connectivity to adenine was not determined at this time. However, NMR characterization of the isolated adenine adduct (vide infra) later revealed that the compound exists as the alternative tautomeric structure 13a.28
Figure 3.

A. Overall workflow for MS studies leading to detection of the adenine adduct 13. The timing of isomerization remains unknown. B. Structure of the additional ions 14−16 detected.
Shortly thereafter, the Balskus laboratory isolated and characterized the same adduct from a large-scale culture of the synthetic analogs 17 and 18 and DNA (Figure 4A) using LC-MS3 DNA adductomics.28 The synthetic analogs 17 and 18 were prepared according to the method of Herzon and Crawford.23 The isolated adduct 13a (Figure 3A) was characterized by NMR analysis. These studies corroborate the work shown in Figure 3A and established (by 2D NMR analysis) that alkylation occurs at N3 of adenine and that the isolated adduct exists as the tautomer 13a (rather than 13, Figure 3A). Because the C4 of the lactam ring of the cyclopropane ring-opened product has been shown to be electrophilic,26 13 is likely the direct product, which could in principle undergo additional alkylations, or isomerize to stable 13a. Regardless of the precise order of events, the same isolated 13a adduct was detected in gnotobiotic mice infected with clb+ E. coli, a critical finding that further links colibactin genotoxicity to DNA alkylation. This discovery provided a critical molecular connection between colibactin induced DNA damage and the genotoxicity produced by clb E. coli both in cellular and animal models. The adducts 13a and 19 were formed as a 1:1 mixture of diastereomers resulting from non-stereoselective oxidation of the ring-opened product. The Balskus laboratory also reported the detection of 13/13a from the genomic DNA isolated from Hela cells infected with clb+ E. coli. 13/13a were also detected from the DNA isolated from the colonic epithelial cells of the germ-free C57BL/6J mice after a two-week-long exposure of clb+ E.coli. The in vivo discovery of 13/13a suggests that these DNA adducts thus serve as biomarkers for colibactin-mediated DNA damage.
Figure 4.

A. Workflow for discovery of the colibactin–adenine adduct 13a using calf thymus DNA (ctDNA) and the synthetic colibactin analogs 17 and 18 by Balskus and co-workers. PLE, pig liver esterase. B. The workflow for detecting colibactin–adenine adducts from gnotobiotic mice infected with clb+ E. coli.
Following these reports, these two research groups independently discovered and reported the full colibactin bis(adenine) adducts using a similar workflow and biosynthetic considerations. The Herzon and Crawford laboratories proposed the complete colibactin bis(adenine) adduct as 20 (Figure 5A) with extensive tandem MS structural characterization.34,36 The hydrate 21 was also detected. Although neither 20 nor 21 were isolated due to their low abundance and instability, extensive isotope labeling was applied to support the structural assignment (see Figure 5). Consecutive loss of two neutral adenine bases was detected in the tandem MS traces, confirming the adducts contained two adenines covalently linked to the colibactin backbone. The carbon–carbon bond in the central diketone is labile to oxidative hydrolysis,20 resulting in the left fragment 13/13a (Figs. 3 and 4) and the novel right-hand fragment 23. The left-hand fragment 13/13a was more abundant than the right-hand fragment 23; this is likely because 13/13a may also form from shunt metabolites in the pathway and not just decomposition of the bis(adenine) adduct 20. This explains why 13/13a was the first identified and characterized nucleoside adduct. The α-aminoketone 22 was also identified. The detection of 22 is significant because it explains why mutation of clbL abrogates the crosslinking phenotype.30 ClbL is proposed to facilitate the heterodimerization of colibactin and install the second cyclopropane containing fragment required to form DNA ICLs (see Figure 7A).
Figure 5.

A. Structure of the colibactin-(bis)adenine adduct 20 and its hydrate 21, along with other colibactin–adenine adducts 22 and 23 discovered by the Herzon and Crawford laboratories. B. The α-aminoketone 24 corresponds to the structure of colibactin based on biosynthetic logic. This material was not detectable in bacterial cultures. Instead, the products of aerobic oxidation and hydrolysis, 25 and 26, were observed. C. Structure of precolibactin 1489 (27).
Figure 7.

A. Proposed heterodimerization mechanisms resulting in precolibactin-1491 (32) or the related intermediates 36 and 37. B. Final steps in the biosynthesis of precolibactin 1491 (32) that either generate colibactin (25) or precolibactin 1489 (27) in clbP mutants.34,38
Based on the structural analysis of 20 and 21, the Herzon and Crawford laboratories proposed the pseudodimeric α-aminoketone 24 as the structure of colibactin, which accounted for a biosynthetic pathway that invoked the complete BGC (Figure 5B). While this metabolite could not be detected, the diketone 25 and its corresponding hydrate 26, deriving from aerobic oxidation of the α-aminoketone residue in 24, followed by hydrolysis, was found in the bacterial extracts. The production of 25 increased 8.5-fold in cultures of a clbS resistance mutant, which facilitated accumulation and characterization. Extensive isotopic labeling and tandem MS studies were used to confirm the structure. While the expected linear precolibactin could not be detected, its non-stereoselective cyclization product, precolibactin 1489 (27), was also detected (Figure 5C). As with the bis(adenine) adducts 20 and 21, 25 and 26 were not formed in quantities sufficient for isolation.
To confirm the structure assignment, the Herzon and Crawford laboratories synthesized the diketone colibactin (25). Synthetic colibactin (25) was indistinguishable from natural material by LC/MS co-injection and tandem MS. In addition, synthetic colibactin (25) induced ICLs in a dose-dependent fashion. Tandem MS analysis of these ICLs revealed that they were indistinguishable from those formed in bacterial cultures, thereby providing strong evidence for the structure proposal.
Concurrently, in a publication detailing the in vitro activity of the amidase ClbL, the Balskus laboratory also detected the structure of the bis(adenine) adduct 20. By studying the substrate scope of ClbL they proposed this amidase possesses a substrate preference towards the α-aminoketone derived by ClbO and the β-ketothioester derived from ClbI (see the Balskus mechanism in Figure 7A). Based on this observation they proposed a heterodimerization even that would yield a product capable of forming the bis(adenine) adduct 20. The structure of 20 was supported by methionine and cysteine labeling experiments.37,38 Key to the discovery of 20 by both groups was the identification of doubly-charged ions resulting from two-fold protonation of the bis(adenine) adduct.
As noted above, the α-aminoketone 24 is the expected structure of colibactin based on biosynthetic logic and characterization of the DNA cross-links. However, this structure was not detectable in the bacterial extracts. Synthetic studies have established that this functional group readily undergoes oxidation to an α-ketoimine, which itself is susceptible to hydrolysis.20 Thus, an unresolved issue involves the nature of this two-carbon linker at the time of the genotoxic event. One would expect the aminoketone to have higher affinity for DNA than the diketone based on electrostatic considerations, however the rate of oxidation in vivo or ex vivo has not been established.
Biosynthesis of colibactin:
The clb gene cluster consists of 19 genes (clbA to clbS) that encode four nonribosomal peptide synthetase (NRPS) proteins (ClbE, H, J, N), four polyketide synthase (PKS) enzymes (ClbC, G, I, O), two hybrid NRPS–PKS enzymes (ClbB, K), and additional accessory, regulatory, and resistance proteins. With the exception of the PKS ClbO and the amidase ClbL, the proposed functions of these enzymes have been summarized in previous reviews.9 Figure 6 provides a simplified representation of the pathway emphasizing the preceding the formation of the pseudodimeric structure of colibactin. Below we discuss recently proposed roles for ClbO, ClbL, and ClbQ in the final steps of colibactin’s biosynthesis.
Figure 6.

Overview of colibactin biosynthesis preceding heterodimeration.34 For simplicity, the intermediates in the early steps of the pathway are not shown.
Two different mechanisms have been proposed to rationalize the formation of colibactin’s heterodimeric structure (Figure 7).34,38 The Crawford and Herzon laboratories proposed that advanced intermediates, such as ClbJ-bound 28, are hydrolyzed by the promiscuous thioesterase ClbQ19,21,39 to produce the carboxylic acid 29 (Figure 7A). Loss of glycine by ClbL-mediated amide bond cleavage would form 30. In protein biochemical studies, we have shown that ClbL can cleave amide bonds to release ammonia (free Asn → Asp); however, its hydrolytic activity appears to be promiscuous in vitro (unpublished). Nevertheless, formation of the resulting acyl-ClbL enzyme intermediate 30 could facilitate heterodimerization by addition of the ClbO-tethered α-amino-β-ketothioeseter 31. After off-loading of the heterodimer from ClbO by hydrolysis and decarboxylation, precolibactin 1491 (32) is thought to be produced as the fully mature precursor to colibactin.
The Balskus laboratory proposed that off-loading and decarboxylation of the ClbO-bound α-amino-β-ketothioester product 33 would form the aminomethyl ketone 34 (Figure 7B).38 The substitution at C37 (amine, imine, or ketone) was not explicitly specified. ClbL-mediated amidation between the α-iminoketone 34 and the ClbI-tethered intermediate 35 would generate 36. In support of this proposal, the Balskus laboratory prepared a ClbL-overexpressing E. coli DH10B BACpksΔclbP strain and characterized a ClbL-dependent metabolite that contained a methylamino ketone. This prompted the researchers to determine that biosynthetic intermediates containing thioesters and α-aminoketones are suitable substrates for ClbL. The researchers were able to reconstitute the activity of ClbL in vitro; amide bond formation was observed after 16 h incubation at 23 °C. Although this pathway leads to the same products, the role of ClbQ is not invoked.
In the final steps of the biosynthesis, processing of precolibactin-1491 (32) by ClbP would precede four-fold cyclodehydration to generate 24 (Figure 7B). However, as discussed above, the α-aminoketone 24 (or precolibactin-1491 (32)) could not be detected. Instead, the product of oxidation of precolibactin-1491 (32) to an α-iminoketone, precolibactin-1489 (27), was predicted by tandem MS and isotope labeling experiments. Based on the structure of related precolibactins (vide supra) it was proposed that precolibactin-1491 (32) exists in a macrocyclic form. These data suggested that the α-amino group of 24 readily oxidizes and the oxidized product is susceptible to hydrolysis to generate colibactin (25). Using multiplex automated genome engineering (MAGE), the Herzon and Crawford laboratories showed that the biosynthesis of precolibactin-1489 (27) requires every unaccounted biosynthetic enzyme in the clb BGC.34
Instability of the thiazole spacing unit within colibactin:
As discussed above, the expected biosynthetic product from the clb pathway contains an α-aminoketone between two thiazole rings (see 24, Figure 5) but to the best of our knowledge, this residue has never been isolated or observed, suggesting it is unstable under the conditions of fermentation and analysis. Attempts to chemically synthesize the α-aminoketone support this notion.20 Specifically, oxidation of the carbamate-protected 1,2-aminoalcohol 38 with the Dess–Martin periodinane formed the α-amino ketone 39 (Figure 8A). Upon exposure to air this product was found to transform to the N-acyl hemiaminal 40.20 Alternatively the α-ketoimine 41 could be prepared by two-fold oxidation of the corresponding 1,2-aminoalcohol (not shown; Figure 8B). As expected, the imine was unstable toward hydrolysis and readily converted to the α-diketone 44 upon aqueous work-up or exposure to silica gel. Thus, the inability to detect the α-aminoketone 24 may arise from the instability of this functional group toward aerobic oxidation.
Figure 8.

A. The α-aminoketone 39 is susceptible to spontaneous aerobic oxidation to the hemiaminal 40. B. The α-ketoimine 41 and the 1,2-diketone 44 undergo nucleophilic cleavage in solutions of basic methanol.20 C. Proposed mechanism for nucleophilic cleavage of the 1,2-diketone 44; an analogous mechanism is expected for the α-ketoimine 41.
In addition, reactivity studies of the model α-ketoimine 41 and 1,2-diketone 44 established that the C36–C37 bond linking the diketone or α-ketoimine is susceptible to cleavage under mildly nucleophilic conditions (Figure 8B). Thus, dissolution of 41 or 44 in methanol containing sodium bicarbonate produced the cleavage products 42 and 43 or 45 and 46 respectively in 40–45% yield. A potential mechanism for this pathway is shown in Figure 8C.
This observed instability provides a simple explanation for the difficulties encountered in isolating fully functionalized (pre)colibactins and accounts for the low titer in isolating colibactins where the α-ketoimine is protected within a macrocycle. Additionally, this aerobic instability and reactivity may account for the lack of genotoxicity observed in clb+ E. coli supernatant or when clb+ E. coli are separated from HeLa cells with a 2 μm membrane. The prevailing hypothesis is that, given the prior observation that cell-to-cell contact is required for toxicity,2 a transport mechanism may be operative in trafficking colibactin to the host cell nucleus. However, given the instability of the α-aminoketone toward oxidation, and the instability of the resulting oxidation products toward nucleophilic cleavage, an alternative and simple explanation might be that these reactive functional groups impart a short half-life to colibactin. Thus, rapid delivery of the compound by the producing organism is required, which necessitates the cells to be in close proximity to one another.
Macrocyclic precolibactins induce DNA double-strand breaks:
Isolation of a clb metabolite that replicates the DNA damage induced by clb+ E. coli has been the holy grail in this area since 2006, when clb DNA damage was first observed.2 In 2019, Qian and Zhang reported the discovery of a number of novel clb metabolites, some of which were capable of inducing DNA DSBs in a copper(II) dependent manner.18
By carrying out a 2,000 L fermentation of a ΔclbPΔclbQΔclbS strain, the researchers were able to procure 50 μg of a novel clb metabolite termed precolibactin 969 (9, Figures 1 and 9). NMR and HRMS analysis indicated that this new metabolite contained the same macrocycle as precolibactin 886 (8, Figure 1), but contained an additional C3HNO2 fragment. This fragment was recognized as deriving from an aminomalonate residue. This same fragment was observed in a second isolated metabolite, precolibactin 795a (7, Figure 1), which corresponds to the addition of C3HNO2 to precolibactin B (4, Figure 1). The use of gene inactivation and isotope-labeling experiments confirmed the role of ClbO in formation of this unique 5-hydroxyoxazole residue. It was observed that macrocyclic precolibactins were unstable to trace metals, an effect that could be mitigated by the addition of metal chelators (EDTA or Chelex-100) to the fermentation medium. Treatment of precolibactin 969 (9) with clbP-expressing E. coli resulted in the expected deacylated “colibactin 645” (47) which was observed by high resolution mass spectrometry (HR-MS) and tandem MS analysis of the bacterial cultures. While never isolated from bacterial culture, colibactin 645 (47) could be detected by MS in wild-type strains.
Figure 9.

Maturation of precolibactin 969 (9) to colibactin 645 (47) when exposed to clbP+ E. coli and then induction of DNA DSBs in eukaryotic cells in the presence of CuCl2.
The researchers investigated the DNA damaging abilities of colibactin 645 (47). The researchers used well-established assays40,41 to probe for the formation of DNA SSBs and DSBs in cell-free and tissue culture experiments. DNA DSBs were observed when pBR322 plasmid DNA was treated with Cu(II) and either precolibactin 969 (9) or colibactin 645 (47), suggesting DSB formation by a copper-dependent pathway. Consistent with this, the induction of DSBs was abrogated in the presence of the Cu(I) chelator neocuproine. Using a Freifelder–Trumbo analysis it was determined that the observed DSBs arise from a single binding event, rather than an accumulation of SSBs. DSBs were formed when beta-mercaptoethanol was employed, which was taken to indicate that the macrocyclic clb metabolites form a complex with copper that is capable of inducing DSBs. A complex between precolibactin 886 (8) and copper was detected by HRMS, and a Ka = 4,120 M−1 for binding Cu(II) was reported. Based on these data the authors suggested that the macrocycle chelates copper and that this chelation is essential for genotoxicity. However, precolibactin 886 (8) was less active than precolibactin 969 (9) in cell-free experiments, suggesting that the 5-hydroxyoxazole fragment also plays a role in DNA damage.
DNA damage induced by macrocyclic clb metabolites was then investigated in tissue culture by monitoring the phosphorylation of histone H2AX (γH2AX) and its colocalization with p53 binding protein 1 (53BP1).40 Cells treated with colibactin 645 (47) accumulated 53BP1 and γH2AX foci, and these foci were found to co-localize. The production of γH2AX and translocation of 53BP1 was not observed in cells treated with precolibactin 969 (9) suggesting that ClbP deacylation is required for toxicity. Copper- and dose-dependent DSB formation by colibactin 645 (47) was observed in a neutral comet unwinding assay. Sequestering the copper salts abrogated DNA damage. These researchers also observed a decrease in DNA damage when clb+ E. coli were treated with copper-sequestering species, suggesting that the metabolite released by the bacteria also require copper for the induction of DNA damage.
In summary, the authors proposed a model for the genotoxicity of colibactin 645 (47) that involves complexation to copper(II) salts, reduction to copper (I), and the generation of an active colibactin-copper chelate that form reactive oxygen species, which induce DNA DSBs. In their studies, the macrocycle proved to be required for DNA DSB formation and the presence of the 5-hydroxy-oxazole greatly increased the reactivity with DNA.
While these results are interesting, they fail to account for key observations in the field. First, precolibactin 969 (9) was isolated from a mutant lacking functional ClbQ, but this enzyme has been demonstrated in two independent studies to be required for genotoxic effects.2,29 Second, this mechanism of action does not address the studies above that established that clb+ E. coli induce ICLs in linearized plasmid DNA and in cellular genomic DNA.29 Third, this mode of action fails to address the cyclopropane hydrolase activity of the self-resistance enzyme ClbS.25,26 Fourth, the functional consequences of ClbP deacylation are ambiguous within this model. The authors speculate that removal of the terminal N-myristoyl-D-Asn residue may promote cellular uptake and DNA binding of colibactin 645 (47), thus explaining why ClbP is required for genotoxicity. However, this explanation does not account for the toxicity of precolibactin 886 (8), which contains the N-myristoyl-D-Asn residue. Additional experiments to probe for variances in cellular uptake (such as a caco-2 assay) or DNA binding (such as Tm studies), and experiments to probe for self-resistance to colibactin 645 (47) (such as a ClbS rescue assay), would be welcome. Precolibactin 886 (8) is likely an off-pathway intermediate; biomimetic synthetic access to this diastereomeric macrocyclization product required cyclization on the HPLC column.20 Perhaps precolibactin 969 (9) is similarly a minor product formed by diversion of the clb pathway by mutation of clbP.
clb+ E. coli induce a mutational signature found in colorectal cancer patients:
A causal link between the clb BGC and human colorectal cancer was recently established.8 Using microinjection, researchers inoculated the lumen of human intestinal organoids with clb+ E. coli. These inoculated organoids accumulated DNA ICLs and DSBs, as anticipated. Repeated microinjections of clb+ E. coli into single cell-derived organoids over five months was performed to model prolonged exposure to colibactin within the human colon. Single cells were taken from these exposed organoids and used to grow sub-clonal organoids that were then analyzed via whole-genome sequencing. This revealed a unique single-base substitution pattern induced by clb+ E. coli compared to controls, specifically a T > N substitution within ATA, ATT, or TTT sequences (middle position mutated). Additionally, a unique clb dependent indel signature denoted by single T deletions at T homopolymers was found. Coupled together, these mutations form a clb fingerprint that identifies cells that had been exposed to colibactin. Additionally, these mutations indicated that colibactin alkylated adenine and these adducts were then repaired via transcription-coupled nucleotide excision repair. This data supports the observation that DNA alkylation and ICL formation induced by 25 via an adenine-colibactin-adenine ICL is the dominant form of DNA damage inflicted by clb+ E. coli. Additionally, the clb mutational signature was found to be enriched within colorectal cancer metastases. This data identifies a colibactin induced mutational signature and correlates this signature to tumor formation within the human colon providing the first clinical link between colibactin genotoxicity and human CRC development.
Conclusions:
The past three years of research have provided key insights into colibactin’s activity, structure, and biosynthesis. In depth studies of clb+ E. coli genotoxicity show that DNA alkylation and cross-linking account for the dominant cellular phenotypes mediated by colibactin, and induction of these ICLs accounts for the previously observed DNA DSBs. The discovery of colibactin induced ICLs inspired novel approaches to elucidate the structure of colibactin that culminated in detection of the mature clb metabolite. The structure of colibactin was confirmed by chemical synthesis and the DNA cross-links induced by the synthetic colibactin were found to be identical to those induced by clb+ E. coli. Finally, discovery of a clb mutational signature linked to colorectal cancer in humans provided direct evidence that alkylation of DNA by colibactin contributed to tumorigenesis in human patients.
However, certain details remain unresolved. The finding that colibactin 645 (47) directly induces DNA DSBs requires further investigation. Perhaps one litmus test in determining which metabolites contribute to clb genotoxicity is screening through ClbS mediated rescue assays. It has been observed that ClbS abrogates clb+ E. coli genotoxicity when added to the co-incuabtion of clb+ E. coli and HeLa cells. Additionally, ClbS has been observed to degrade compounds containing the electrophilic warhead found in colibactin (25) but not pyridone containing metabolites such as 4. Thus, studying the activity of ClbS towards precolibactin 886 (8), precolibactin 969 (9), and colibactin 645 (47) may provide insight into the relevance of DSB inducing clb metabolites compared to metabolites that induce ICLs that can evolve into DSBs. Additionally, with a clb produced signature known, it will be possible to test possible clb metabolites for their ability to induce this mutational signature and thereby validate the clb metabolite(s) responsible in damaging host DNA. The clb pathway is observed to produce multiple metabolites, thus investigating the suspected polypharmacological nature of clb will refine our knowledge of clb+ E. coli pathogenicity. Additionally, the cyclopropane ring-opened warhead forms an organic peroxide and may present an additional electrophilic unit at the C4 of the lactam ring, which could participate in the process. Finally, as discussed above, the nature of the two-carbon spacer between the thiazoles of colibactin at the time of the genotoxic event and its potential role as an additional electrophilic residue have not been established.
What is clear is that significant advances in understanding colibactin’s impact on human health have been recorded over the last three years. Further interdisciplinary investigations will complete our understanding of this system. Additionally, the approach outlined here suggests strategies to characterize the structure and function of metabolites that are beyond the reaches of the canonical “grind and find” paradigm.
Acknowledgements:
Financial support from the National Institutes of Health (R01CA215553, to S.B.H. and J.M.C, and the Chemistry Biology Interface Training Program, T32GM067543, to K.M.W.), the Swiss National Science Foundation (P2EZP2_187928 to A.T.), the National Research Foundation of Korea (2019R1A6A3A12033304, to C.S.K.), and Yale University is gratefully acknowledged.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References:
- 1.Putze J, Hennequin C, Nougayrède JP, et al. Genetic structure and distribution of the colibactin genomic island among members of the family Enterobacteriaceae. Infect Immun. 2009;77:4696–4703. doi: 10.1128/IAI.00522-09 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Nougayrède J-P, Homburg S, Taieb F, et al. Escherichia coli induces DNA double-strand breaks in eukaryotic cells. Science. 2006;313:848–851. doi: 10.1126/science.1127059 [DOI] [PubMed] [Google Scholar]
- 3.Arthur JC, Perez-Chanona E, Mühlbauer M, et al. Intestinal inflammation targets cancer-inducing activity of the microbiota. Science. 2012;338(6103):120–123. doi: 10.1126/science.1224820 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tomkovich S, Yang Y, Winglee K, et al. Locoregional effects of microbiota in a preclinical model of colon carcinogenesis. Cancer Res. 2017;77:2620–2632. doi: 10.1158/0008-5472.CAN-16-3472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cougnoux A, Dalmasso G, Martinez R, et al. Bacterial genotoxin colibactin promotes colon tumour growth by inducing a senescence-associated secretory phenotype. Gut. 2014;63(12):1932–1942. doi: 10.1136/gutjnl-2013-305257 [DOI] [PubMed] [Google Scholar]
- 6.Bonnet M, Buc E, Sauvanet P, et al. Colonization of the human gut by E. coli and colorectal cancer risk. Clin Cancer Res. 2014;20:859–867. doi: 10.1158/1078-0432.CCR-13-1343 [DOI] [PubMed] [Google Scholar]
- 7.Buc E, Dubois D, Sauvanet P, et al. High prevalence of mucosa-associated E. coli producing cyclomodulin and genotoxin in colon cancer. PLoS One. 2013;8:e56964. doi: 10.1371/journal.pone.0056964 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Pleguezuelos-Manzano C, Puschhof J, Huber AR, et al. Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli. Nature. 2020;580:269–273. doi: 10.1038/s41586-020-2080-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Faïs T, Delmas J, Barnich N, Bonnet R, Dalmasso G. Colibactin: More than a new bacterial toxin. Toxins (Basel). 2018;10:151–167. doi: 10.3390/toxins10040151 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Healy AR, Herzon SB. Molecular basis of gut microbiome-associated colorectal cancer: a synthetic perspective. J Am Chem Soc. 2017;139:14817–14824. doi: 10.1021/jacs.7b07807 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Vizcaino MI, Engel P, Trautman E, Crawford JM. Comparative metabolomics and structural characterizations illuminate colibactin pathway-dependent small molecules. J Am Chem Soc. 2014;136:9244–9247. doi: 10.1021/ja503450q [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Vizcaino MI, Crawford JM. The colibactin warhead crosslinks DNA. Nat Chem. 2015;7:411–417. doi: 10.1038/nchem.2221 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bian X, Plaza A, Zhang Y, Müller R. Two more pieces of the colibactin genotoxin puzzle from Escherichia coli show incorporation of an unusual 1-aminocyclopropanecarboxylic acid moiety. Chem Sci. 2015;6:3154–3160. doi: 10.1039/c5sc00101c [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Brotherton CA, Wilson M, Byrd G, Balskus EP. Isolation of a metabolite from the pks island provides insights into colibactin biosynthesis and activity. Org Lett. 2015;17:1545–1548. doi: 10.1021/acs.orglett.5b00432 [DOI] [PubMed] [Google Scholar]
- 15.Li ZR, Li Y, Lai JYH, et al. Critical intermediates reveal new bosynthetic events in the enigmatic colibactin pathway. ChemBioChem. 2015;16:1715–1719. doi: 10.1002/cbic.201500239 [DOI] [PubMed] [Google Scholar]
- 16.Healy AR, Vizcaino MI, Crawford JM, Herzon SB. Convergent and modular synthesis of candidate precolibactins. Structural revision of precolibactin A. J Am Chem Soc. 2016;138:5426–5432. doi: 10.1021/jacs.6b02276 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zha L, Wilson MR, Brotherton CA, Balskus EP. Characterization of polyketide synthase machinery from the pks island facilitates isolation of a candidate precolibactin. ACS Chem Biol. 2016;11:1287–1295. doi: 10.1021/acschembio.6b00014 [DOI] [PubMed] [Google Scholar]
- 18.Li Z-R, Li J, Cai W, et al. Macrocyclic colibactin induces DNA double-strand breaks via copper-mediated oxidative cleavage. Nat Chem. 2019;11:880–889. doi: 10.1038/s41557-019-0317-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Li Z-R, Li J, Gu J-P, et al. Divergent biosynthesis yields a cytotoxic aminomalonate-containing precolibactin. Nat Chem Biol. 2016;12:773–775. doi: 10.1038/nchembio.2157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Healy AR, Wernke KM, Kim CS, Lees NR, Crawford JM, Herzon SB. Synthesis and reactivity of precolibactin 886. Nat Chem. 2019;11:890–898. doi: 10.1038/s41557-019-0338-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Trautman EP, Healy AR, Shine EE, Herzon SB, Crawford JM. Domain-targeted metabolomics delineates the heterocycle assembly steps of colibactin biosynthesis. J Am Chem Soc. 2017;139:4195–4201. doi: 10.1021/jacs.7b00659 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Boger DL, Garbaccio RM. Shape-dependent catalysis: Insights into the source of catalysis for the CC-1065 and duocarmycin DNA alkylation reaction. Acc Chem res. 1999;32:1043–1052. doi: 10.1021/ar9800946 [DOI] [Google Scholar]
- 23.Healy AR, Nikolayevskiy H, Patel JR, Crawford JM, Herzon SB. A mechanistic model for colibactin-induced genotoxicity. J Am Chem Soc. 2016;138:15563–15570. doi: 10.1021/jacs.6b10354 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Brotherton CA, Balskus EP. A prodrug resistance mechanism is involved in colibactin biosynthesis and cytotoxicity. J Am Chem Soc. 2013;135:3359–3362. doi: 10.1021/ja312154m [DOI] [PubMed] [Google Scholar]
- 25.Bossuet-Greif N, Dubois D, Petit C, et al. Escherichia coli ClbS is a colibactin resistance protein. Mol Microbiol. 2016;99:897–908. doi: 10.1111/mmi.13272 [DOI] [PubMed] [Google Scholar]
- 26.Tripathi P, Shine EE, Healy AR, et al. ClbS is a cyclopropane hydrolase that confers colibactin resistance. J Am Chem Soc. 2017;139:17719–17722. doi: 10.1021/jacs.7b09971 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Xue M, Shine E, Wang W, Crawford JM, Herzon SB. Characterization of natural colibactin-nucleobase adducts by tandem mass spectrometry and isotopic labeling. Support for DNA alkylation by cyclopropane ring opening. Biochemistry. 2018;57:6391–6394. doi: 10.1021/acs.biochem.8b01023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Wilson MR, Jiang Y, Villalta PW, et al. The human gut bacterial genotoxin colibactin alkylates DNA. Science. 2019;363:eaar7785. doi: 10.1126/science.aar7785 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Bossuet-Greif N, Vignard J, Taieb F, et al. The colibactin genotoxin generates DNA interstrand cross-links in infected cells. MBio. 2018;9:e02393. doi: 10.1128/mBio.02393-17 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Shine EE, Xue M, Patel JR, et al. Model colibactins exhibit human cell genotoxicity in the absence of host bacteria. ACS Chem Biol. 2018;13:3286–3293. doi: 10.1021/acschembio.8b00714 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Clauson C, Scharer OD, Niedernhofer L. Advances in understanding the complex mechanisms of DNA interstrand cross-link repair. Cold Spring Harb Perspect Biol. 2013;5:a012732. doi: 10.1101/cshperspect.a012732 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Charrier J-D, Durrant SJ, Golec JMC, et al. Discovery of potent and selective inhibitors of ataxia telangiectasia mutated and Rad3 related (ATR) protein kinase as potential anticancer agents. J Med Chem. 2011;54:2320–2330. doi: 10.1021/jm101488z [DOI] [PubMed] [Google Scholar]
- 33.Reaper PM, Griffiths MR, Long JM, et al. Selective killing of ATM- or p53-deficient cancer cells through inhibition of ATR. Nat Chem Biol. 2011;7:428–430. doi: 10.1038/nchembio.573 [DOI] [PubMed] [Google Scholar]
- 34.Xue M, Kim CS, Healy AR, et al. Structure elucidation of colibactin and its DNA cross-links. Science. 2019;365:eaax2685. doi: 10.1126/science.aax2685 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Xue M, Shine E, Wang W, Crawford JM, Herzon SB. Characterization of natural colibactin-nucleobase adducts by tandem mass spectrometry and isotopic labeling. Support for DNA alkylation by cyclopropane ring opening. Biochemistry. 2018;57:6391–6394. doi: 10.1021/acs.biochem.8b01023 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Xue M, Kim CS, Healy AR, et al. Structure elucidation of colibactin. bioRxiv. 2019;March 12. doi: 10.1101/574053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Jiang Y, Stornetta A, Villalta PW, et al. Reactivity of an unusual amidase may explain colibactin’s DNA cross-linking activity. bioRxiv. 2019;March 4. doi: 10.1021/jacs.9b02453 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Jiang Y, Stornetta A, Villalta PW, et al. Reactivity of an unusual amidase may explain colibactin’s DNA cross-linking cctivity. J Am Chem Soc. 2019;141:11489–11496. doi: 10.1021/jacs.9b02453 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Guntaka NS, Healy AR, Crawford JM, Herzon SB, Bruner SD. Structure and functional analysis of ClbQ, an unusual intermediate-releasing thioesterase from the colibactin biosynthetic pathway. ACS Chem Biol. 2017;12:2598–2608. doi: 10.1021/acschembio.7b00479 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Colis LC, Woo CM, Hegan DC, Li Z, Glazer PM, Herzon SB. The cytotoxicity of (−)-lomaiviticin A arises from induction of double-strand breaks in DNA. Nat Chem. 2014;6:504–510. doi: 10.1038/nchem.1944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Woo CM, Li Z, Paulson EK, Herzon SB. Structural basis for DNA cleavage by the potent antiproliferative agent (–)-lomaiviticin A. Proc Natl Acad Sci U S A. 2016;113:2851–2856. doi: 10.1073/pnas.1519846113 [DOI] [PMC free article] [PubMed] [Google Scholar]
