The intricate relationship of G-Quadruplexes and bacterial pathogenicity islands

Bo Lyu; Qisheng Song

doi:10.7554/eLife.91985

. 2024 Feb 23;12:RP91985. doi: 10.7554/eLife.91985

The intricate relationship of G-Quadruplexes and bacterial pathogenicity islands

Bo Lyu ^1,^✉, Qisheng Song ^1,^✉

Editors: Bavesh D Kana², Bavesh D Kana³

PMCID: PMC10942614 PMID: 38391174

Abstract

The dynamic interplay between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) represents a captivating area of research with implications for understanding the molecular mechanisms underlying pathogenicity. This study conducted a comprehensive analysis of a large-scale dataset from reported 89 pathogenic strains of bacteria to investigate the potential interactions between G4 structures and PAIs. G4 structures exhibited an uneven and non-random distribution within the PAIs and were consistently conserved within the same pathogenic strains. Additionally, this investigation identified positive correlations between the number and frequency of G4 structures and the GC content across different genomic features, including the genome, promoters, genes, tRNA, and rRNA regions, indicating a potential relationship between G4 structures and the GC-associated regions of the genome. The observed differences in GC content between PAIs and the core genome further highlight the unique nature of PAIs and underlying factors, such as DNA topology. High-confidence G4 structures within regulatory regions of Escherichia coli were identified, modulating the efficiency or specificity of DNA integration events within PAIs. Collectively, these findings pave the way for future research to unravel the intricate molecular mechanisms and functional implications of G4-PAI interactions, thereby advancing our understanding of bacterial pathogenicity and the role of G4 structures in pathogenic diseases.

Research organism: E. coli

Introduction

The discovery of the DNA double helix by Watson and Crick in 1953 revolutionized our understanding of genetics and laid the foundation for the modern field of molecular biology (Watson and Crick, 1953). Nonetheless, the intricate nature of DNA continues to surprise us even today. One such captivating feature is the DNA guanine (G)-quadruplex (G4) structure, a unique arrangement that defies the conventional double helix (Rhodes and Lipps, 2015; Spiegel et al., 2020). A G4 consists of four guanine bases and is stabilized by Hoogsteen hydrogen bonds. These stacked tetrads are interconnected by loop regions, which can vary in length and sequence, adding further complexity to the structure (Figure 1). It is important to consider the inherent directionality of nucleic acids, with all four strands having the possibility to run in the same 5' to 3' direction, referred to as ‘parallel,’ or alternatively, they can run in different directions, known as ‘antiparallel.’. G4 regions can be very stable in vitro, particularly in the presence of K⁺ (Stegle et al., 2009). G4 structures are often found in regions of the genome with crucial regulatory functions, such as telomeres, promoters, and enhancers (Rhodes and Lipps, 2015; Huppert, 2010). These structures play a role in various biological processes, including gene expression, DNA replication, and telomere maintenance (Rhodes and Lipps, 2015; Zybailov et al., 2013). Further research into G4 structures will undoubtedly uncover new insights into their functions and facilitate the development of innovative technologies.

Figure 1. — (A) Schematic representation of a guanine tetrad stabilized by Hoogsten base pairing and a positively charged central ion, illustrating the key elements of G4 structures. (B) Structural heterogeneity of G4 structures. G4 structures exhibit polymorphism and can be categorized into different families, such as parallel or antiparallel, based on the orientation of the DNA strands. They can fold either intramolecularly or intermolecularly, leading to diverse structural configurations. (C) General sequence formula for G4, highlighting the repeated occurrence of guanine-rich sequences that form G4 structures. (D) Regulatory roles of G4 in transcription. G4 can regulate transcription by blocking RNA polymerase from binding to promoter sequences or aiding in single-stranded DNA (ssDNA) formation, thereby enhancing transcription. (E) General structure of pathogenicity islands (PAI). PAIs are characteristic regions of DNA found within the genomes of pathogenic bacteria, distinguishing them from nonpathogenic strains of the same or related species. Repeat sequences are DNA segments duplicated within the PAI and can serve as recognition sites for various enzymes involved in the integration and excision of the PAI from the bacterial chromosome. tRNA genes act as anchor points for the insertion of foreign DNA acquired through horizontal gene transfer. Virulence genes encode proteins or factors that play crucial roles in the virulence and pathogenicity of the bacterium, contributing to adhesion, invasion, immune evasion, toxin production, or other pathogenic mechanisms. Insertion elements include transposons, bacteriophages, or plasmids, enabling the PAI to be transferred between bacterial cells and potentially disseminated to different strains or species.

PAIs are genomic regions that contribute to the virulence and pathogenic potential of various microorganisms (Schmidt and Hensel, 2004; Groisman and Ochman, 1996). PAIs are distinct segments of the bacterial genome that exhibit unique characteristics compared to the rest of the DNA (Hacker and Kaper, 2000). They are often large in size, ranging from tens of kilobases to hundreds of kilobases, and can be integrated into the chromosome or exist as extra-chromosomal elements, such as plasmids. PAIs often exhibit close proximity to tRNA genes, suggesting a putative mechanism where tRNA genes act as anchor points for the integration of foreign DNA acquired through horizontal gene transfer (Figure 1E). One notable feature is their variable GC content, which tends to deviate from the average GC content of the genome in various organisms, such as Streptomyces (Kers et al., 2005), Salmonella (Kombade and Kaur, 2021), and Yersinia (Carniel, 1999). PAIs typically contain clusters of genes involved in pathogenesis, including those encoding secretion systems (e.g. LEE (locus of enterocyte effacement) in Escherichia coli), superantigen (e.g. SaPI1 and SaPI2 in Staphylococcus aureus), and enterotoxin (e.g. she PAI in Shigella flexneri). PAIs can be acquired through the transfer of mobile genetic elements, such as plasmids, phages, or integrative and conjugative elements (ICEs), facilitating the incorporation of pathogenicity-associated genes into the recipient genome (Schmidt and Hensel, 2004; Syvanen, 2012; Chen et al., 2015). One question raised in PAI is that PAIs often exhibit distinct base composition (G+C contents) compared to the core genome. The underlying reasons for this variation remain unknown, but the preservation of a genus- or species-specific base composition represents a noteworthy characteristic of bacteria (Schmidt and Hensel, 2004). Schmidt and Hensel proposed a hypothetical mechanism to explain the observed variation, suggesting that factors such as DNA topology and codon message in the virulence regions present could contribute to the preservation of the distinct base composition (Schmidt and Hensel, 2004). Hopefully, the availability of genome sequences from pathogenic bacteria and their non-pathogenic counterparts presents an exceptional opportunity to explore the intricate structure variance and underlying mechanisms within PAIs.

Growing evidence has shown that G4 structures exhibit a striking colocalization with functional regions of the genome, and their high conservation across different species suggests a selective pressure to maintain these sequences at specific genomic regions (e.g. genome islands, resistance islands, CpG islands, and PAIs) (Rhodes and Lipps, 2015; Frees et al., 2014; König et al., 2010). The possibility of interactions between G4 structures and pathogens has been suggested, although this field of study is still in its nascent phase. Some studies observed that bacterial genomes possess G4-forming sequences within their genome regions (Yadav et al., 2021; Harris and Merrick, 2015). G4 structures are formed by G-rich DNA sequences, and their stability is influenced by the G+C content and arrangement of G tetrads. Interestingly, PAIs often exhibit an altered GC content, putatively contributing to the propensity of G4 structure formation within these regions. The G4 structures in PAIs might modulate the accessibility of transcription factors, DNA-binding proteins, or RNA polymerase in pathogens, as documented in eukaryotes (Rhodes and Lipps, 2015; Varshney et al., 2020), thereby influencing the expression of virulence-associated genes (Cahoon and Seifert, 2009). The formation of G4 structures within PAIs may serve as an additional layer of regulation that fine-tunes the expression of genes critical for pathogenesis. Hence, the investigation of G4 structures within PAIs may open new avenues for the development of therapeutic strategies aimed at disrupting the regulatory mechanisms of pathogenicity-associated genes.

Results

Genomic information, PAI patterns, and the presence of G4 structures in 89 reported pathogenic strains

A dataset of PAIs was compiled from 89 reported pathogenic strains of bacteria, encompassing 222 distinct types of PAIs. Pathogens exhibiting similar PAIs displayed closely clustered patterns on phylogenetic branches, such as LEE in E. coli strains (Figure 2A). Additional information, including the genome length (bp), G+C content (%), rRNA density, tRNA density, and PAI length (bp), was present and showed conserved patterns in the same species (Figure 2A; Supplementary file 1a). PAIs commonly exhibit mosaic-like patterns, exemplified by the presence of distinct PAIs like FPI in Francisella tularensis, SaPIbov in Staphylococcus aureus, and Hrp PAI in Xanthomonas campestris (Figure 2B). Many PAIs were present associated with tRNAs, such as the insertions of tRNA^Thr, tRNA^Phe, and tRNA^Gly in E. coli strains (Figure 2B; Supplementary file 1b). The presence of PAIs distributes in similar genomic regions across different pathogens or strains, showing non-random patterns and functionally clustered. Employing the G4Hunter search algorithm, the study identified a total of 225,376 putative G4 sequences in these 89 pathogenic genomes (Supplementary file 1a). The heatmap also showed that the number of G4 structures was diverse in the pathogen genomes (Figure 2C).

Figure 2. — (A) Phylogenetic analysis of pathogen genomes based on 89 bacterial strains, showing the evolutionary relationships among species. Additional genomic information, including genome size, GC content, rRNA density, tRNA density, and PAI length, is provided. The same color indicates the same species. (B) Genomic location of specific PAIs in bacterial genomes, divided into ten regions. PAIs are represented by green triangles, and their names are indicated. The tRNA insertion sites are also marked. (C) Heatmap illustrating the relative abundance of G4 structures in bacterial genomes, divided into ten regions. Red indicates a higher relative abundance, while blue indicates a lower relative abundance. (D & E) Correlation analysis between the number of G4 structures, the frequency of G4 structures, and GC content in various genomic features, including the whole genome, genes, promoters, rRNA, and tRNA. R-squared and p-values were derived through linear regression analysis performed in GraphPad Prism.

Figure 2—figure supplement 1. — (A) Phylogenetic analysis of pathogen genomes based on 89 bacterial strains, showing the evolutionary relationships among species. Additional genomic information, including genome size, GC content, rRNA density, tRNA density, and PAI length, is provided. The same color indicates the same species. (B) Genomic location of specific PAIs in bacterial genomes, divided into ten regions. PAIs are represented by green triangles, and their names are indicated. The tRNA insertion sites are also marked. (C) Heatmap illustrating the relative abundance of G4 structures in bacterial genomes, divided into ten regions. Red indicates a higher relative abundance, while blue indicates a lower relative abundance. (D & E) Correlation analysis between the number of G4 structures, the frequency of G4 structures, and GC content in various genomic features, including the whole genome, genes, promoters, rRNA, and tRNA. R-squared and p-values were derived through linear regression analysis performed in GraphPad Prism.

Interaction between PAIs and G4 structures in different genomic features

The analysis of G4 structures across all pathogen species demonstrated a positive correlation between the number of G4 structures and the GC content in various genomic features, including the whole genome, gene, promoter, rRNA, and tRNA regions (Figure 2D). The frequency of G4 structures, measured as the frequency of predicted G4-forming sequences per 1000 base pairs (bp), also showed a positive correlation with the GC content across the analyzed genomic elements (Figure 2E). A G4 score of 1.4 and 1.6 consistently supported a positive correlation between the number and frequency of G4 structures and the GC content across diverse genomic features (Figure 2—figure supplement 1). Additionally, this study observed that the GC contents in the genome region were significantly higher compared to the corresponding PAIs region that was classified into five parts according to the genome datasets (Figure 3A–E). Nonetheless, this study noted a unique pattern in the frequency of G4 structures within diverse regions of the PAIs, particularly in regions with GC contents less than 30% and greater than 60%.

Figure 3. — (**A–E**) Comparison of GC content (left panel) and GC frequency (right panel) between the genome and PAIs, categorized into five regions (20–30%, 30–40%, 40–50%, 50–60%, and 60–70%). */**/***/**** indicates significant difference (p<0.05/0.01/0.001/0.0001). (F) Evolutionary relatedness of 10 types of PAIs (categorized into six main categories) in *E. coli* strains. (G & H) Examples of G4 structures within PAIs in *E. coli* strains. The gray bar represents the virulence region, the red box indicates a virulence gene, the blue box represents an insertion site region or repeat, the green box denotes an integrase, the purple triangle indicates a tRNA insertion site, and the yellow triangle indicates an effector. (I &J) Functional annotation analysis of G4-covered genes within PAIs in two *E. coli* strains, including biological process (BP), cellular component (CC), and molecular function (MF) categories. (K) Hypotheses on the origin of G4 structures within PAIs, involving gene horizontal transfer mechanisms (conjugation, transduction, and transformation).

Putative functions of G4 structures in PAIs

The study used E. coli as an example to investigate the potential regulatory role and function of genes covered by G4 structures in PAIs. E. coli contains at least ten types of PAIs in different strains, and one of the well-known PAIs is LEE (Figure 3F), harboring genes responsible for causing attaching and effacing lesions (Franzin and Sircili, 2015; Jores et al., 2004). One stable G4 structure with a G4Hunter score of 1.6 was identified at position 37,085 in the LEE PAI of E. coli str. O103:H2 12009 (Figure 3G), located between an IS element and a tRNA insertion site. The tRNA region generally contains a higher G4 frequency compared with transfer-messenger RNA (tmRNA) and rRNA regions in the bacterial genome (Bartas et al., 2019). Interestingly, this G4 structure was found in E. coli str. O103:H2 12009 was present in close proximity to a tRNA region, suggesting a potential regulatory role of G4 structures in the tRNA gene, or upstream- and downstream-genes that are responsible for LEE virulence. Additionally, another stable G4 sequence with a score of 1.381 was discovered at position 12,457 in the E. coli str. CFT073 PAI II to provide more evidence of G4 in PAI regions (Figure 3H). Functional enrichment analysis was conducted to explore the putative functions of G4-covered genes in the two E. coli strains (Supplementary file 1c and d). The results revealed that the genes covered by G4 structures were predominantly involved in genetic information processes, including DNA binding, DNA integration, and nucleic acid metabolism processes (Figure 3I & J).

Discussion

This study found that the non-random distribution of G4 structures within PAIs across different bacterial species, signifies a potential regulatory role in bacterial pathogenicity. The conservation of G4 structures within the same pathogenic strains suggests a crucial and possibly conserved function in regulating pathogenic traits. The findings are similar to previous reports that showed that the G4 structures display uneven distribution patterns in eukaryotic and prokaryotic genomes and are conserved evolutionary groups (Bartas et al., 2019; Du et al., 2009; Puig Lombardi et al., 2019). To understand the origin of G4 structures within PAIs, we hypothesized that these G4 sequences could be acquired through three types of horizontal gene transfer mechanisms: conjugation, transformation, and transduction (Figure 3K). These mechanisms serve as means for genetic material exchange between different organisms. Considering the presence of G4 sequences within the PAIs, it is plausible that these sequences are transferred along with the PAIs through these horizontal gene transfer mechanisms. Additionally, the presence of G4 structures within the promoter, rRNA, and tRNA regions may have functional implications for the regulation of DNA replication, ribosome biogenesis, protein synthesis, and other RNA-related processes (Zybailov et al., 2013; Ivanov et al., 2014; Mestre-Fos et al., 2019). Throughout evolution, there seems to be a greater frequency of G4 structures in regulatory genes, such as the tRNA region, compared to other genes, enabling intricate control of gene expression in signal transduction pathways (Wu et al., 2021).

The study found that the genomic regions surrounding the PAIs (i.e. core genome) tend to have a higher GC content than PAI regions, which was consistent with the fact that PAIs often exhibit distinct base compositions compared with the core genome (Schmidt and Hensel, 2004). The variation was explained by the presence of G4 sequences within the PAIs, whereas the results were surprising. This study observed a distinct pattern in the frequency of G4 structures within different regions of the PAIs. This differential distribution of G4 structures suggests that (i) specific genomic segments within the PAIs may be more prone to induce G4 formation discrepancy; (ii) the variation of base composition between core genome and PAIs is partially correlated with the presence of G4 structures; (iii) the frequency of G4 structures in PAIs present stable as the core genome in the most situation; (iv) an alternative hypothesis, other factors, such as i-motif (i.e. the anti-G4 structure) and CpG island, may work synergistically with G4 and potentially contribute the base composition variation (Deaton and Bird, 2011; Sushmita, 2020).

Enrichment analysis indicated a predominant involvement of these G4-covered genes in genetic information processes, encompassing DNA binding, DNA integration, and nucleic acid metabolism. This suggests that G4 structures may play a regulatory role in these essential cellular processes, especially gene expression and DNA-related functions. For instance, G4 structures in the promoter regions of certain transcription factors may influence their binding affinity to DNA and subsequently affect downstream gene expression patterns (Niu et al., 2018; Xiang et al., 2022). These elements frequently utilize DNA integration mechanisms mediated by integrases, recombinases, or transposases to transfer or incorporate genetic material into the bacterial genome (Arkhipova and Rice, 2016; Wozniak and Waldor, 2010). One compelling illustration is a study that identified a 16-base pair cis-acting G4 sequence near the pilin locus in Neisseria gonorrhoeae, demonstrating its pivotal role in antigenic variation and directing recombination to a specific chromosomal locus (Cahoon and Seifert, 2009; Cahoon and Seifert, 2013). Disruption of the G4 structure in this context impeded pilin antigenic variation and recombination, highlighting its significance in immune evasion mechanisms. Additionally, considering the distance between G4 structures and the beginning site of gene (e.g. transcription start site (TSS)) in the analysis of promoter regions is pivotal for a comprehensive understanding of their regulatory impact on gene expression (Huppert, 2010). The spatial proximity to the TSS influences interactions with regulatory elements, potentially modulating the binding of transcription factors and RNA polymerase. This spatial relationship affects accessibility, with G4 structures closer to the TSS potentially acting as direct impediments to transcription initiation. Acknowledging these spatial nuances would provide crucial insights into the functional implications of G4 structures in promoters.

Overall, the conserved evolutionary relatedness of PAIs, the detection of stable G4 structures in specific genomic positions, and the enrichment of G4-covered genes in genetic information processes collectively support the hypothesis that G4 structures may have regulatory functions in key biological processes in pathogens. However, it is important to acknowledge and address certain limitations that could potentially affect the interpretation of the results. One such limitation is the reliance on genome sequences obtained from external laboratories and datasets, which introduces a level of uncertainty regarding the accuracy and completeness. Furthermore, the dynamic nature of bacterial genomes, including genetic rearrangements and horizontal gene transfer events, can complicate the accurate assembly and annotation of genome sequences. Lastly, the stability of G4 structures seems to be important for their function according to recent evidence (Jara-Espejo and Line, 2020). Hence, exploring the relationship between G4 stability and function is a valuable and intriguing topic that could provide insights into the nuanced ways G4 structures contribute to cellular processes and potentially offer new avenues for therapeutic interventions or molecular engineering. To overcome these constraints, fostering collaboration among research teams and participating in data-sharing endeavors becomes imperative to guarantee access to high-quality genome data for exhaustive analyses. Moreover, it is crucial to interpret the results with caution and continue refining this understanding through validation experiments and collaborative efforts.

Methods

Selection and extraction of DNA sequences

A total of 89 genomes corresponding to the identified pathogens from the Pathogenicity Island Database (PAIDB) were included in the study. The complete bacterial genomic DNA sequences and their corresponding annotation files in.gff and.fna formats were obtained from the Genome database of the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/genome). To ensure the reliability and completeness of the dataset, only completely assembled genomes were included in the analysis. To avoid redundancy and incomplete sequences, one representative genome was selected for each species or strain. The selection of representative genomes was based on a careful examination of the supplementary material (Supplementary file 1a) accompanying the study. TBtools II (Toolbox for Biologists, RRID:SCR_023018, v2.042) (https://cj-chen.github.io/tbtools), a versatile bioinformatics tool with extensive applications in both eukaryotes and prokaryotes (Chen et al., 2020; Chen et al., 2023), was employed for extracting genomic sequences. This tool facilitated the retrieval of gene regions, promoters (2 kb upstream of the genes), tRNA regions, and rRNA regions from the selected genomes. PAI regions were downloaded following previously documented information in PAIDB (Supplementary file 1a and b). Default thresholds and parameters were applied during extraction to maintain consistency across all genomes.

Data process and detection of G4 structures in genomic features

The G4Hunter algorithm, a widely used tool for G4 prediction, was employed for the identification of G4 motifs in the genomic sequences (Brázda et al., 2019). The G4Hunter parameters were set to a window size of ‘25’ and a G4 score threshold of 1.2, which ensured the identification of potential G4 sequences (Bartas et al., 2019; Brázda et al., 2020). The study additionally utilized G4 scores of 1.4 and 1.6 as a means of cross-verification for the results. The study quantified the predicted number of putative G4-forming sequences within different genomic features, including the whole genome, gene, promoter, tRNA, rRNA, and PAI regions. The density of G4 motifs was determined by dividing the number of G4 sequences by the total length of the genome, while the length ratio of G4 motifs was calculated by dividing the total length of the G4 sequences by the total length of the genome.

Relationship between G4 structures and PAIs

The heatmap was used to show the distribution of G4 motifs in the genome divided by ten parts as PAI regions using R package ‘pheatmap.’ The correlation between the number of G4 structures and the GC content was analyzed across various genomic elements, including the whole genome, gene, promoter, rRNA, and tRNA regions. The analysis utilized the R-squared value (R²) to determine the fit goodness of the correlation. The correlation’s significance was evaluated through p-values along with a 95% confidence interval. Subsequently, a ROC analysis, yielding an area greater than 0.90, was employed to quantify sensitivity and specificity. The GC content in the genome regions and corresponding PAI regions was compared and classified into different ranges to explore the variation in base composition. GraphPad Prism (V.5.02, GraphPad Software, Inc) was employed to conduct Normality and Lognormality Tests. The K-S test and F-test were used to assess normal distribution and variances, and the Student’s t-test was used to identify significant differences.

Phylogenetic tree construction

The exact Taxonomy ID (taxid) for each analyzed group was obtained from the NCBI Taxonomy Database using the Taxonomy Browser. The Neighbor-Joining (NJ) method was employed to construct the phylogenetic trees for the analyzed groups. The phylogenetic trees were generated using MEGA11 software (https://www.megasoftware.net/), which offers robust algorithms and comprehensive tools for phylogenetic analysis. To assess the reliability and statistical support of the phylogenetic tree branches, bootstrap analysis was performed. One thousand bootstrap replicates were used to estimate the confidence levels of the branching patterns in the phylogenetic trees. The phylogenetic trees, along with the bootstrap support values, were displayed and visualized using the Interactive Tree of Life (ITOL) platform (https://itol.embl.de/).

Gene functional annotation

The gene sequences covered by G4 structures within PAIs were subjected to gene ontology (GO) annotation (https://geneontology.org/). The gene sequences were translated into protein sequences using the Expasy online toolkit (https://web.expasy.org/translate/). This tool performs the translation based on the standard genetic code, converting the DNA nucleotide sequence into its corresponding amino acid sequence. The GO annotation database assigned GO terms to the protein sequences based on their predicted functions and known biological process (BP), molecular function (MF), and cellular component (CC). Fisher’s exact test was employed to determine the statistical significance of the enrichment results. The obtained p-values indicated the overrepresentation of specific GO terms, with lower p-values suggesting higher significance.

Statistics and reproducibility

All genomic data utilized in this study, including the species-specific datasets, were obtained from publicly available sources. Statistical analyses, such as the Student’s t-test, Wilcoxon test, correlation test, and linear regression analysis, were performed using GraphPad Prism software. The samples used in the statistical analyses corresponded to the genomic data, PAIs, or specific genes under investigation.

Acknowledgements

The sincere appreciation extends to Dr. Sung Ho Yoon and his colleagues for their dedicated efforts in identifying PAIs and establishing the Pathogenicity Island Database for public analysis. Their commitment to advancing the field of pathogen genomics has greatly facilitated this research. This study would like to thank Dr. Jingjing Li (Zhejiang University) and Dr. Mingyu Zhou (Sun Yat-Sen University) for their insightful suggestions and constructive comments regarding the exploration of G4 structures in genomes. Their expertise and guidance have significantly enriched the understanding of the potential roles and implications of G4 structures in the context of PAIs.

Funding Statement

No external funding was received for this work.

Contributor Information

Bo Lyu, Email: bl3pt@missouri.edu.

Qisheng Song, Email: SongQ@missouri.edu.

Bavesh D Kana, University of the Witwatersrand, South Africa.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Data curation, Software, Formal analysis, Writing – original draft.

Conceptualization, Supervision, Writing – review and editing.

Additional files

Supplementary file 1. The genomic features of pathogens and pathogenic islands, as well as the putative functions of G4s across two E. coli strains.

(a) List of 89 bacterial species or strains within the same species and the number of G4s within their genomic features. (b) List of pathogenicity islands (PAIs) reported to be present in genome sequences and the number of G4s within PAIs. (c) Functional annotations for E. coli strain 1. (d) Functional annotations for E. coli strain 2.

elife-91985-supp1.xlsx^{(89.6KB, xlsx)}

MDAR checklist

elife-91985-mdarchecklist1.docx^{(42.6KB, docx)}

Data availability

The original reported PAIs datasets analyzed in this study are available from the publication Yoon et al., 2015. Additionally, Supplementary file 1 provides further PAIs data analyzed in the study.

The following previously published datasets were used:

Saenz HL. 2007. Intracellular pathogen isolated from wild rats. NCBI BioProject. PRJNA28109

Gartemann KH. 2008. Phytopathogen that causes bacterial wilt and canker of tomato. NCBI BioProject. PRJNA19643

University of Helsinki 2021. Clostridium perfringens isolates and their heat resistance. NCBI BioProject. PRJNA707150

Bielefeld University 2012. Corynebacterium diphtheriae 241 genome sequencing. NCBI BioProject. PRJNA42407

Bielefeld University 2012. Corynebacterium diphtheriae C7 (beta) genome sequencing. NCBI BioProject. PRJNA42401

Bielefeld University 2012. Corynebacterium diphtheriae CDCE 8392 genome sequencing. NCBI BioProject. PRJNA42405

Bielefeld University 2012. Corynebacterium diphtheriae HC03 genome sequencing. NCBI BioProject. PRJNA42415

Bielefeld University 2012. Corynebacterium diphtheriae INCA 402 genome sequencing. NCBI BioProject. PRJNA42419

Sanger Institute 2003. Causative agent of diphtheria. NCBI BioProject. PRJNA87

Bielefeld University 2012. Corynebacterium diphtheriae PW8 genome sequencing. NCBI BioProject. PRJNA42403

Bielefeld University 2012. Corynebacterium pseudotuberculosis 1002 genome sequencing. NCBI BioProject. PRJNA40687

University Federal of Minas Gerais 2020. Corynebacterium pseudotuberculosis strain C231, whole genome sequencing. NCBI BioProject. PRJNA40875

Bielefeld University 2019. Corynebacterium pseudotuberculosis FRC41 genome sequencing project. NCBI BioProject. PRJNA48979

Rede Paraense de Genômica e Proteômica 2019. Corynebacterium pseudotuberculosis I19 genome sequencing. NCBI BioProject. PRJNA52845

The Enterobacter sakazakii Genome Sequencing Project 2007. Isolated from dried infant formula and causes infant septicemia. NCBI BioProject. PRJNA12720

TIGR 2003. Opportunistic pathogen that transfers vancomycin resistance to other bacteria. NCBI BioProject. PRJNA70

Baylor College of Medicine 2012. reference genome for the Human Microbiome Project. NCBI BioProject. PRJNA30627

University Iowa State 2006. Avian pathogenic strain. NCBI BioProject. PRJNA16718

Genentech 2020. Escherichia coli CFT073 isolate:199310 Genome sequencing. NCBI BioProject. PRJNA624646

University of Tokyo 2009. Enterohemorrhagic strain. NCBI BioProject. PRJDA32509

University of Tokyo 2009. This strain will be used for comparative genome analysis. NCBI BioProject. PRJDA32511

University of Tokyo 2009. This strain will be used for comparative genome analysis. NCBI BioProject. PRJDA32513

University of California San Diego 2014. Escherichia coli O157:H7 str. EDL933 Genome sequencing. NCBI BioProject. PRJNA253471

GIRC 2018. Enterohemorrhagic Escherichia coli. NCBI BioProject. PRJNA226

Genoscope 2008. Urinary tract infection isolate. NCBI BioProject. PRJNA33415

DOE Joint Genome Institute 2008. Causative agent of tularemia. NCBI BioProject. PRJNA19571

Los Alamos National Laboratory 2015. Francisella tularensis tularensis Schu_S4 Genome sequencing. NCBI BioProject. PRJNA239340

BioHealthBase 2007. Causative agent of tularemia. NCBI BioProject. PRJNA18459

Wuerzburg Univ 2003. Causes hepatitis, typhlitis, hepatocellular tumors, and gastric bowel disease. NCBI BioProject. PRJNA185

RIPCM 2012. Helicobacter pylori 26695 Genome sequencing. NCBI BioProject. PRJNA175543

Bielefeld University 2010. Helicobacter pylori B8 genome sequencing project. NCBI BioProject. PRJEA41831

The University of Tokyo 2011. Helicobacter pylori F16 genome sequencing project. NCBI BioProject. PRJDA50589

The University of Tokyo 2011. Helicobacter pylori F30 genome sequencing project. NCBI BioProject. PRJDA50591

The University of Tokyo 2011. Helicobacter pylori F32 genome sequencing project. NCBI BioProject. PRJDA50593

The University of Tokyo 2011. Helicobacter pylori F57 genome sequencing project. NCBI BioProject. PRJDA50595

Icahn School of Medicine at Mount Sinai 2015. Multi-strain, long-read bacterial genome sequencing. NCBI BioProject. PRJNA281410

Max von Pettenkofer-Institut für Hygiene und Medizinische Mikrobiologie, Ludwig-Maximilians-Universität München 2008. Clinical isolate. NCBI BioProject. PRJNA32291

Wuhan Institute of Virology, Chinese Academy of Sciences 2008. Mosquito larvae pathogen. NCBI BioProject. PRJNA19619

Microbial Genome Center of ChMPH 2007. Unknown strain. NCBI BioProject. PRJNA16393

TIGR 2005. Causes meningitis and septicemia. NCBI BioProject. PRJNA251

IREC 2010. Rhodococcus equi strain 103S whole genome sequencing project. NCBI BioProject. PRJEA41335

Sanger Institute 2008. An opportunistic pathogen in normal gut flora. NCBI BioProject. PRJNA12624

TIGR 2003. Causes plant rot. NCBI BioProject. PRJNA359

Chang Gung Genomic Medical Center, Chang Gung Memorial Hospital 2005. Extremely invasive Salmonella that causes severe disease in pigs and humans. NCBI BioProject. PRJNA9618

Sanger Institute 2003. Human-specific Salmonella that causes Typhoid fever. NCBI BioProject. PRJNA236

Wisconsin Univ 2003. Human-specific Salmonella that causes Typhoid fever. NCBI BioProject. PRJNA371

Washington University Genome Sequencing Center 2016. Major laboratory strain of Salmonella typhimurium. NCBI BioProject. PRJNA241

Microbial Genome Center of ChMPH 2011. Human-specific pathogen that causes endemic dysentery. NCBI BioProject. PRJNA310

Wisconsin Univ 2003. Human-specific pathogen that causes endemic dysentery. NCBI BioProject. PRJNA408

Minnesota Univ 2005. Associated with mastitis in cattle. NCBI BioProject. PRJNA63

IntegratedGenomics 2013. Staphylococcus aureus subsp. aureus CN1 Genome sequencing. NCBI BioProject. PRJNA162343

TIGR 2005. Methicillin resistant strain. NCBI BioProject. PRJNA238

University of Edinburgh 2009. Staphylococcus aureus ED98 genome sequencing. NCBI BioProject. PRJNA39547

University of Edinburgh 2010. Staphylococcus aureus subsp. aureus ED133 genome sequencing project. NCBI BioProject. PRJNA41277

Sanger Institute 2004. Methicillin resistant strain from the UK. NCBI BioProject. PRJNA265

Sanger Institute 2004. Methicillin sensitive strain from the UK. NCBI BioProject. PRJNA266

Univ Juntendo 2004. Methicillin and vancomycin resistant strain. NCBI BioProject. PRJNA263

NITE 2004. Methicillin resistant strain. NCBI BioProject. PRJNA306

Univ Juntendo 2004. Methicillin resistant strain. NCBI BioProject. PRJNA264

University Medical Centre Utrecht 2010. Staphylococcus aureus subsp. aureus ST398. NCBI BioProject. PRJEA29427

Juntendo University School of Medicine, Department of Bacteriology 2007. An opportunistic pathogen in humans and animals. NCBI BioProject. PRJDA18801

University of California, San Francisco 2006. A methicillin resistant strain of Staphylococcus aureus. NCBI BioProject. PRJNA16313

Chinese National HGC Shanghai 2002. Used for detection of residual antibiotics in food products. NCBI BioProject. PRJNA279

TIGR 2005. Pathogenic clinical isolate that causes toxic-shock syndrome and staphylococcal scarlet fever. NCBI BioProject. PRJNA64

The Wellcome Trust Sanger Institute 2009. Causes strangles disease. NCBI BioProject. PRJEA30765

Herz- und Diabeteszentrum Nordrhein-Westfalen Universitätsklinik der Ruhr-Universität Bochum 2011. Streptococcus gallolyticus subsp. galloyticus ATCC BAA-2069 genome sequencing. NCBI BioProject. PRJEA63179

Department ofMicrobiology University of Kaiserslautern 2010. Clinical isolate. NCBI BioProject. PRJNA16302

UniversityChang Gung 2012. Streptococcus parasanguinis FW213 Genome sequencing. NCBI BioProject. PRJNA76769

Wellcome Trust Sanger Institute 2009. multidrug resistant strain. NCBI BioProject. PRJEA31233

Institute Broad 2012. Genome sequencing with short reads. NCBI BioProject. PRJNA76613

Lab of Human Bacterial Pathogenesis 2005. Causative agent of a wide range of human and animal infections. NCBI BioProject. PRJNA13888

Lab of Human Bacterial Pathogenesis 2005. Causative agent of a wide range of human and animal infections. NCBI BioProject. PRJNA13887

Beijing Institute of Genomics, Chinese Academy of Sciences 2007. Causes disease in pigs and occasionally humans. NCBI BioProject. PRJNA17153

Beijing Institute of Genomics, Chinese Academy of Sciences 2007. Causes disease in pigs and occasionally humans. NCBI BioProject. PRJNA17155

Wellcome Sanger Institute 2019. Updated VC N16961 reference genome. NCBI BioProject. PRJEB22249

Centers for Disease Control and Prevention 2011. Vibrio cholerae O1 str. 2010EL-1786 genome sequencing project. NCBI BioProject. PRJNA59943

University of Oslo 2019. Vibrio cholerae O395 isolate:TCP2 Genome sequencing. NCBI BioProject. PRJNA586749

Sao Paulostate (Brazil) Consortium 2003. Plant-specific pathogen that causes citrus canker. NCBI BioProject. PRJNA297

Chinese National HGC Shanghai 2005. Causes black rot and citrus canker. NCBI BioProject. PRJNA15

Sao Paulostate (Brazil) Consortium 2002. Plant-specific pathogen that causes black rot. NCBI BioProject. PRJNA296

GenomicsBacterial. LaboratoryEvolution and CSIR-Institute of Microbial Technology, Sector 39-A, Chandigarh, India 2017. Xanthomonas campestris pv. vitistrifoliae strain:LMG940 Genome sequencing and assembly. NCBI BioProject. PRJNA298596

NIAB, Rural Development Administration 2005. Causes rice bacterial blight disease. NCBI BioProject. PRJNA12931

Sanger Institute 2007. Food and waterborn pathogen that causes gastroenteritis. NCBI BioProject. PRJNA190

Academy of Military Medical Sciences, The Institute of Microbiology and Epidemiology, China 2004. Extremely virulent organism that causes plague. NCBI BioProject. PRJNA10638

Sanger Institute 2003. Extremely virulent organism that causes plague. NCBI BioProject. PRJNA34

J. Craig Venter Institute 2009. Yersinia pestis KIM D27 genome sequencing project. NCBI BioProject. PRJNA41469

TIGR 2007. Serotype 1b strain isolated from a patient in Russia. NCBI BioProject. PRJNA16070

Los Alamos National Laboratory 2015. Yersinia pseudotuberculosis IP 32953 Genome sequencing. NCBI BioProject. PRJNA239344

The Wellcome Trust Sanger Institute 2011. Staphylococcus aureus subsp. aureus MSHR1132 genome sequencing project. NCBI BioProject. PRJEA62885

References

Arkhipova IR, Rice PA. Mobile genetic elements: in silico, in vitro, in vivo. Molecular Ecology. 2016;25:1027–1031. doi: 10.1111/mec.13543. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bartas M, Čutová M, Brázda V, Kaura P, Šťastný J, Kolomazník J, Coufal J, Goswami P, Červeň J, Pečinka P. The presence and localization of G-Quadruplex forming sequences in the domain of bacteria. Molecules. 2019;24:1711. doi: 10.3390/molecules24091711. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brázda V, Kolomazník J, Lýsek J, Bartas M, Fojta M, Šťastný J, Mergny JL. G4Hunter web application: a web server for G-quadruplex prediction. Bioinformatics. 2019;35:3493–3495. doi: 10.1093/bioinformatics/btz087. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brázda V, Luo Y, Bartas M, Kaura P, Porubiaková O, Šťastný J, Pečinka P, Verga D, Da Cunha V, Takahashi TS, Forterre P, Myllykallio H, Fojta M, Mergny JL. G-Quadruplexes in the archaea domain. Biomolecules. 2020;10:1349. doi: 10.3390/biom10091349. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cahoon LA, Seifert HS. An alternative DNA structure is necessary for pilin antigenic variation in Neisseria gonorrhoeae. Science. 2009;325:764–767. doi: 10.1126/science.1175653. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cahoon LA, Seifert HS. Transcription of a cis-acting, noncoding, small RNA is required for pilin antigenic variation in Neisseria gonorrhoeae. PLOS Pathogens. 2013;9:e1003074. doi: 10.1371/journal.ppat.1003074. [DOI] [PMC free article] [PubMed] [Google Scholar]
Carniel E. The Yersinia high-pathogenicity island. International Microbiology. 1999;2:161–167. [PubMed] [Google Scholar]
Chen J, Carpena N, Quiles-Puchalt N, Ram G, Novick RP, Penadés JR. Intra- and inter-generic transfer of pathogenicity island-encoded virulence genes by cos phages. The ISME Journal. 2015;9:1260–1263. doi: 10.1038/ismej.2014.187. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, Xia R. TBtools: an integrative Toolkit developed for interactive analyses of big biological data. Molecular Plant. 2020;13:1194–1202. doi: 10.1016/j.molp.2020.06.009. [DOI] [PubMed] [Google Scholar]
Chen C, Wu Y, Li J, Wang X, Zeng Z, Xu J, Liu Y, Feng J, Chen H, He Y, Xia R. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Molecular Plant. 2023;16:1733–1742. doi: 10.1016/j.molp.2023.09.010. [DOI] [PubMed] [Google Scholar]
Deaton AM, Bird A. CpG islands and the regulation of transcription. Genes & Development. 2011;25:1010–1022. doi: 10.1101/gad.2037511. [DOI] [PMC free article] [PubMed] [Google Scholar]
Du Z, Zhao Y, Li N. Genome-wide colonization of gene regulatory elements by G4 DNA motifs. Nucleic Acids Research. 2009;37:6784–6798. doi: 10.1093/nar/gkp710. [DOI] [PMC free article] [PubMed] [Google Scholar]
Franzin FM, Sircili MP. Locus of enterocyte effacement: a pathogenicity island involved in the virulence of enteropathogenic and enterohemorragic Escherichia coli subjected to a complex network of gene regulation. BioMed Research International. 2015;2015:534738. doi: 10.1155/2015/534738. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frees S, Menendez C, Crum M, Bagga PS. QGRS-Conserve: a computational method for discovering evolutionarily conserved G-quadruplex motifs. Human Genomics. 2014;8:8. doi: 10.1186/1479-7364-8-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Groisman EA, Ochman H. Pathogenicity islands: bacterial evolution in quantum leaps. Cell. 1996;87:791–794. doi: 10.1016/s0092-8674(00)81985-6. [DOI] [PubMed] [Google Scholar]
Hacker J, Kaper JB. Pathogenicity islands and the evolution of microbes. Annual Review of Microbiology. 2000;54:641–679. doi: 10.1146/annurev.micro.54.1.641. [DOI] [PubMed] [Google Scholar]
Harris LM, Merrick CJ. G-quadruplexes in pathogens: a common route to virulence control? PLOS Pathogens. 2015;11:e1004562. doi: 10.1371/journal.ppat.1004562. [DOI] [PMC free article] [PubMed] [Google Scholar]
Huppert JL. Structure, location and interactions of G-quadruplexes. The FEBS Journal. 2010;277:3452–3458. doi: 10.1111/j.1742-4658.2010.07758.x. [DOI] [PubMed] [Google Scholar]
Ivanov P, O’Day E, Emara MM, Wagner G, Lieberman J, Anderson P. G-quadruplex structures contribute to the neuroprotective effects of angiogenin-induced tRNA fragments. PNAS. 2014;111:18201–18206. doi: 10.1073/pnas.1407361111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jara-Espejo M, Line SR. DNA G-quadruplex stability, position and chromatin accessibility are associated with CpG island methylation. The FEBS Journal. 2020;287:483–495. doi: 10.1111/febs.15065. [DOI] [PubMed] [Google Scholar]
Jores J, Rumer L, Wieler LH. Impact of the locus of enterocyte effacement pathogenicity island on the evolution of pathogenic Escherichia coli. International Journal of Medical Microbiology. 2004;294:103–113. doi: 10.1016/j.ijmm.2004.06.024. [DOI] [PubMed] [Google Scholar]
Kers JA, Cameron KD, Joshi MV, Bukhalid RA, Morello JE, Wach MJ, Gibson DM, Loria R. A large, mobile pathogenicity island confers plant pathogenicity on Streptomyces species. Molecular Microbiology. 2005;55:1025–1033. doi: 10.1111/j.1365-2958.2004.04461.x. [DOI] [PubMed] [Google Scholar]
Kombade S, Kaur N. Pathogenicity Island in Salmonella Spp.-A Global Challenge. IntechOpen; 2021. [DOI] [Google Scholar]
König SLB, Evans AC, Huppert JL. Seven essential questions on G-quadruplexes. Biomolecular Concepts. 2010;1:197–213. doi: 10.1515/bmc.2010.011. [DOI] [PubMed] [Google Scholar]
Mestre-Fos S, Penev PI, Suttapitugsakul S, Hu M, Ito C, Petrov AS, Wartell RM, Wu R, Williams LD. G-Quadruplexes in human ribosomal RNA. Journal of Molecular Biology. 2019;431:1940–1955. doi: 10.1016/j.jmb.2019.03.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Niu K, Zhang X, Deng H, Wu F, Ren Y, Xiang H, Zheng S, Liu L, Huang L, Zeng B, Li S, Xia Q, Song Q, Palli SR, Feng Q. BmILF and i-motif structure are involved in transcriptional regulation of BmPOUM2 in Bombyx mori. Nucleic Acids Research. 2018;46:1710–1723. doi: 10.1093/nar/gkx1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
Puig Lombardi EP, Londoño-Vallejo A, Nicolas A. Relationship between G-Quadruplex sequence composition in viruses and their hosts. Molecules. 2019;24:1942. doi: 10.3390/molecules24101942. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rhodes D, Lipps HJ. G-quadruplexes and their regulatory roles in biology. Nucleic Acids Research. 2015;43:8627–8637. doi: 10.1093/nar/gkv862. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schmidt H, Hensel M. Pathogenicity islands in bacterial pathogenesis. Clinical Microbiology Reviews. 2004;17:14–56. doi: 10.1128/CMR.17.1.14-56.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
Spiegel J, Adhikari S, Balasubramanian S. The structure and function of DNA G-Quadruplexes. Trends in Chemistry. 2020;2:123–136. doi: 10.1016/j.trechm.2019.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stegle O, Payet L, Mergny JL, MacKay DJC, Leon JH. Predicting and understanding the stability of G-quadruplexes. Bioinformatics. 2009;25:i374–i382. doi: 10.1093/bioinformatics/btp210. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sushmita N. I-motif DNA: significance and future prospective. Exploratory Animal and Medical Research. 2020;10:18–23. doi: 10.1093/af/vfaa021. [DOI] [Google Scholar]
Syvanen M. Evolutionary implications of horizontal gene transfer. Annual Review of Genetics. 2012;46:341–358. doi: 10.1146/annurev-genet-110711-155529. [DOI] [PubMed] [Google Scholar]
Varshney D, Spiegel J, Zyner K, Tannahill D, Balasubramanian S. The regulation and functions of DNA and RNA G-quadruplexes. Nature Reviews. Molecular Cell Biology. 2020;21:459–474. doi: 10.1038/s41580-020-0236-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Watson JD, Crick F. A structure for deoxyribose nucleic acid. Nature. 1953;171:737–738. doi: 10.1038/171737a0. [DOI] [PubMed] [Google Scholar]
Wozniak RAF, Waldor MK. Integrative and conjugative elements: mosaic mobile genetic elements enabling dynamic lateral gene flow. Nature Reviews. Microbiology. 2010;8:552–563. doi: 10.1038/nrmicro2382. [DOI] [PubMed] [Google Scholar]
Wu F, Niu K, Cui Y, Li C, Lyu M, Ren Y, Chen Y, Deng H, Huang L, Zheng S, Liu L, Wang J, Song Q, Xiang H, Feng Q. Genome-wide analysis of DNA G-quadruplex motifs across 37 species provides insights into G4 evolution. Communications Biology. 2021;4:98. doi: 10.1038/s42003-020-01643-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xiang L, Niu K, Peng Y, Zhang X, Li X, Ye R, Yu G, Ye G, Xiang H, Song Q, Feng Q. DNA G-quadruplex structure participates in regulation of lipid metabolism through acyl-CoA binding protein. Nucleic Acids Research. 2022;50:6953–6967. doi: 10.1093/nar/gkac527. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yadav P, Kim N, Kumari M, Verma S, Sharma TK, Yadav V, Kumar A. G-Quadruplex structures in bacteria: biological relevance and potential as an antimicrobial target. Journal of Bacteriology. 2021;203:e0057720. doi: 10.1128/JB.00577-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yoon SH, Park YK, Kim JF. PAIDB v2.0: exploration and analysis of pathogenicity and resistance islands. Nucleic Acids Research. 2015;43:624–630. doi: 10.1093/nar/gku985. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zybailov BL, Sherpa MD, Glazko GV, Raney KD, Glazko VI. G4-quadruplexes and genome instability. Molecular Biology. 2013;47:197–204. doi: 10.1134/S0026893313020180. [DOI] [PubMed] [Google Scholar]

eLife. doi: 10.7554/eLife.91985.3.sa0

eLife assessment

Bavesh D Kana ¹

This fundamental study explores the relationship between guanine-quadruplex structures and pathogenicity islands in 89 bacterial strains representing a range of pathogens. Guanine-quadruplex structures were found to be non-randomly distributed within pathogenicity islands and conserved within the same strains. These compelling findings shed light on the molecular mechanisms of Guanine-quadruplex structure-pathogenicity island interactions and will be of interest to all microbiologists.

eLife. doi: 10.7554/eLife.91985.3.sa1

Reviewer #1 (Public Review):

Anonymous

Summary:

This study explores the relationship between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 pathogenic strains.

Strengths:

The findings of this study hold significant implications for our understanding of bacterial pathogenicity and the role of guanine-quadruplex (G4) structures:

Molecular Mechanisms of Pathogenicity: The study highlights that G4 structures are not randomly distributed within pathogenicity islands (PAIs), suggesting a potential role in regulating pathogenicity. This insight into the uneven distribution of G4s within PAIs provides a basis for further research into the molecular mechanisms underlying bacterial pathogenicity.

Conservation of G4 Structures: The consistent conservation of G4 structures within the same pathogenic strains suggests that these structures might play a vital and possibly conserved role in the pathogenicity of these bacteria. This finding opens doors for exploring how G4s influence virulence across different pathogens.

Unique Nature of PAIs: The differences in GC content between PAIs and the core genome underscore the unique nature of PAIs. This distinction suggests that factors such as DNA topology and G4 structures might contribute to the specialized functions and characteristics of PAIs, which are often associated with virulence genes.

Regulatory Role of G4s: The identification of high-confidence G4 structures within regulatory regions of Escherichia coli implies that these structures could influence the efficiency or specificity of DNA integration events within PAIs. This finding provides a potential mechanism by which G4s can impact the pathogenicity of bacteria.

Weaknesses:

None

Overall, the study provides fundamental insights into the pathogenicity island and conservation of G4 motifs.

eLife. doi: 10.7554/eLife.91985.3.sa2

Reviewer #2 (Public Review):

Anonymous

Summary: In the mauscript entitled "The Intricate Relationship of G-Quadruplexes and Pathogenicity Islands: A Window into Bacterial Pathogenicity" Bo Lyu explored the interactions between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 bacterial genomes through rigorous computational approach. This paper handles an intriguing and complex topic in the field pathogenomics, it has the potential to contribute significantly to the understanding of G4-PAI interactions and bacterial pathogenicity.

Strengths: Chosen research area and summarizing the results through neat illustrations

Weaknesses: I did not find any significant ones.

eLife. 2024 Feb 23;12:RP91985. doi: 10.7554/eLife.91985.3.sa3

Author Response

Bo Lyu ¹, Qisheng Song ²

The following is the authors’ response to the original reviews.

Reviewer #1 (Public Review):

Summary:

This study explores the relationship between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 pathogenic strains. G4 structures were found to be non-randomly distributed within PAIs and conserved within the same strains. Positive correlations were observed between G4s and GC content across various genomic features, suggesting a link between G4 structures and GC-rich regions. Differences in GC content between PAIs and the core genome underscored the unique nature of PAIs. High-confidence G4 structures in Escherichia coli's regulatory regions were identified, influencing DNA integration within PAIs. These findings shed light on the molecular mechanisms of G4-PAI interactions, enhancing our understanding of bacterial pathogenicity and G4 structures in infectious diseases.

Strengths:

The findings of this study hold significant implications for our understanding of bacterial pathogenicity and the role of guanine-quadruplex (G4) structures.Molecular Mechanisms of Pathogenicity: The study highlights that G4 structures are not randomly distributed within pathogenicity islands (PAIs), suggesting a potential role in regulating pathogenicity. This insight into the uneven distribution of G4s within PAIs provides a basis for further research into the molecular mechanisms underlying bacterial pathogenicity.

Conservation of G4 Structures: The consistent conservation of G4 structures within the same pathogenic strains suggests that these structures might play a vital and possibly conserved role in the pathogenicity of these bacteria. This finding opens doors for exploring how G4s influence virulence across different pathogens.Unique Nature of PAIs: The differences in GC content between PAIs and the core genome underscore the unique nature of PAIs. This distinction suggests that factors such as DNA topology and G4 structures might contribute to the specialized functions and characteristics of PAIs, which are often associated with virulence genes.Regulatory Role of G4s: The identification of high-confidence G4 structures within regulatory regions of Escherichia coli implies that these structures could influence the efficiency or specificity of DNA integration events within PAIs. This finding provides a potential mechanism by which G4s can impact the pathogenicity of bacteria.

Weaknesses:

No weaknesses were identified by this reviewer.

Overall, the study provides fundamental insights into the pathogenicity island and conservation of G4 motifs.

Thank you for your thorough review of our manuscript exploring the relationship between G4 structures and PAIs in 89 pathogenic strains. We appreciate your recognition of the strengths of our study and its potential implications for understanding bacterial pathogenicity. We are pleased that you highlighted the significance of our findings in revealing the non-random distribution and conservation of G4 structures within PAIs across various pathogenic strains.

Your insightful comments about the molecular mechanisms of pathogenicity, the conservation of G4 structures, the unique nature of PAIs, and the regulatory role of G4s within Escherichia coli are invaluable. We are encouraged by your positive evaluation of these aspects, which underscores the potential impact of our work on advancing the understanding of bacterial pathogenicity.

Reviewer #2 (Public Review):

Summary:

In the manuscript entitled "The Intricate Relationship of G-Quadruplexes and Pathogenicity Islands: A Window into Bacterial Pathogenicity" Bo Lyu explored the interactions between guanine-quadruplex (G4) structures and pathogenicity islands (PAIs) in 89 bacterial genomes through a rigorous computational approach. This paper handles an intriguing and complex topic in the field of pathogenomics. It has the potential to contribute significantly to the understanding of G4-PAI interactions and bacterial pathogenicity.

Strengths:

The chosen research area.

The summarizing of the results through neat illustrations.

Weaknesses:

This reviewer did not find any significant weaknesses.

Thank you for your positive and encouraging feedback on our manuscript. We appreciate your specific mention of the strengths, particularly highlighting the chosen research area and the effectiveness of our illustrations in summarizing the results. Your acknowledgment of these aspects is motivating, and we are pleased that the content and presentation resonated well with you.

Reviewer #3 (Public Review):

The main problem with the work is that the results are only descriptive and do not allow any inferences or conclusions about the importance of the function of G4 structures. The discussion and conclusions are poor. The results are preliminary and in order to try to make the analysis more interesting, it should be further extended and the data must be explored in a much greater depth.

Thank you for your constructive feedback on our manuscript, and appreciate the time and effort you dedicated to evaluating our work. We acknowledge your concern regarding the descriptive nature of the results and the limitations in making inferences about the importance of G4 structures. To address this, we plan to enhance the depth of our analysis and provide more insightful interpretations in the discussion and conclusion sections. It's important to note that this study is intentionally a short report, emphasizing data mining findings rather than laboratory results. We understand the value of in-depth investigations and concur that our work lays the groundwork for more extensive studies in this area, aiming to provide a real-world scenario. We are committed to addressing your comments and refining our manuscript to contribute meaningfully to this field. Your insights are invaluable, and we look forward to presenting an improved version of our study.

Reviewer #2 (Recommendations For The Authors):

The authors could try a higher G-quadruplex score of 1.4 or higher values to substantiate their findings or pick up the bacterial genomes that relied on G4s for their pathogenecity.

We acknowledge your recommendation to explore a higher G-quadruplex score, and we would like to assure you that we have already conducted analyses using thresholds of 1.4 and 1.6. The findings consistently support the observations presented in the manuscript. We have updated the text to reflect this additional analysis, and the results are included in the revised version of the manuscript (Figure S1).

Reviewer #3 (Recommendations For The Authors):

Minor points

Introduction

Q1. The introduction is shallow. The concept and the importance of PAIs is vague. Why should these genes be different from other genes?

A1: Thank you for your valuable feedback and we have incorporated additional content to provide a more comprehensive understanding of PAIs and their distinctiveness from other genes in the Introduction section.

Changes: Lines 44-49 “G4 structures are ...innovative technologies.” were added.

Lines 51-55 “PAIs are distinct...such as plasmids.” were added.

Lines 60-66 “PAIs typically contain...recipient genome” were added.

Lines 77-80 “Growing evidence has...CpG islands, and PAIs” were added.

Material and Methods

Q2. It is not clear if the author used the TBTools or the G4Hunter software G4 structures. It would be interesting to include references to published articles that used this software.

A2: Thank you! Corrected and added more references that used TBTools to extract sequences and G4Hunter to identify G4 structures.

Q3. The statistical significance must not be based only on p-values. P-values are influenced by sample sizes. I strongly recommend the use of other parameters such as confidence interval and ROC analysis.

A3: Thank you! We have incorporated confidence intervals and ROC analysis to complement p-values, enhancing the robustness of our statistical analysis.

Changes: Lines 265-267 “The correlation's significance... sensitivity and specificity.” were added.

Results and discussion

Q4. The stability of G4 structures seems to be important for its function (doi:10.1111/febs.15065). Therefore it would be interesting if the analysis were carried out separating the G4 according to stability.

A4: Thank you for highlighting the importance of G4 structure stability for its function and suggesting an analysis based on stability. We have carefully reviewed the referenced paper (doi:10.1111/febs.15065) and note that their study focused on the stability analysis of individual G4s. In our current study, we identified a large number of G4s, and while stability analysis for each G4 is indeed an interesting avenue, it goes beyond the scope of this particular investigation. However, we agree that exploring the relationship between G4 stability and function is a valuable topic. We plan to delve deeper into this aspect in future work, as discussed in our response to your previous comment.

Changes: Lines 217-221 “Lastly, the stability of G4...molecular engineering.” were added.

Q5. The quality of the figures is poor. Is not possible to read the correlation and p-values from Figure 2.

A5: The revised figure is now submitted with enhanced clarity to ensure that correlation and p-values can be easily discerned.

Q6. The analysis of promoter regions should be performed taking into account the distance between the G4 and the beginning of the gene.

A6: Thank you and we have elaborated more in the revision.

Changes: Lines 198-106 “Additionally, considering the distance...of G4 structures in promoters.” were added.

Q7. The topic "Putative origin, transfer mechanisms, and functions of G4s in PAIs". The comments made on this topic are purely speculative and not backed up by data or any type of experimental analysis.

A7: We appreciate the feedback and have revised the title to emphasize the focus on the functions of G4s in PAIs. We acknowledge that the content related to the putative origin and transfer mechanisms of G4s in PAIs is purely descriptive and speculative, we have made the adjustment to relocate this information to the discussion section for a more appropriate treatment.

Q8. The supplemental material is hard to follow. The meaning of each column should be better explained. Why was the data divided into 10 parts?

A8: Following your suggestion, we have revised the tables for better clarity. To address concerns about the division into 10 parts, we have decided to remove this data from the tables as it was deemed unnecessary for presentation.

Q9. Why was the data of E. coli strains 1 and 2 shown in Tables S3 and S4 and the other bacterial strains were not?

A9: We appreciate your inquiry. The data of E. coli strains 1 and 2 were specifically highlighted in Tables S3 and S4 as illustrative examples to demonstrate the putative functions of G4s in PAIs within the scope of our study. Given the extensive nature of function annotation analyses across various pathogenic strains, presenting additional tables for each strain would have resulted in an impractical volume of supplementary material.

Q10. The Results and Discussion should be separated.

A10: Thank you! Corrected as suggested.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Saenz HL. 2007. Intracellular pathogen isolated from wild rats. NCBI BioProject. PRJNA28109
Gartemann KH. 2008. Phytopathogen that causes bacterial wilt and canker of tomato. NCBI BioProject. PRJNA19643
University of Helsinki 2021. Clostridium perfringens isolates and their heat resistance. NCBI BioProject. PRJNA707150
Bielefeld University 2012. Corynebacterium diphtheriae 241 genome sequencing. NCBI BioProject. PRJNA42407
Bielefeld University 2012. Corynebacterium diphtheriae C7 (beta) genome sequencing. NCBI BioProject. PRJNA42401
Bielefeld University 2012. Corynebacterium diphtheriae CDCE 8392 genome sequencing. NCBI BioProject. PRJNA42405
Bielefeld University 2012. Corynebacterium diphtheriae HC03 genome sequencing. NCBI BioProject. PRJNA42415
Bielefeld University 2012. Corynebacterium diphtheriae INCA 402 genome sequencing. NCBI BioProject. PRJNA42419
Sanger Institute 2003. Causative agent of diphtheria. NCBI BioProject. PRJNA87
Bielefeld University 2012. Corynebacterium diphtheriae PW8 genome sequencing. NCBI BioProject. PRJNA42403
Bielefeld University 2012. Corynebacterium pseudotuberculosis 1002 genome sequencing. NCBI BioProject. PRJNA40687
University Federal of Minas Gerais 2020. Corynebacterium pseudotuberculosis strain C231, whole genome sequencing. NCBI BioProject. PRJNA40875
Bielefeld University 2019. Corynebacterium pseudotuberculosis FRC41 genome sequencing project. NCBI BioProject. PRJNA48979
Rede Paraense de Genômica e Proteômica 2019. Corynebacterium pseudotuberculosis I19 genome sequencing. NCBI BioProject. PRJNA52845
The Enterobacter sakazakii Genome Sequencing Project 2007. Isolated from dried infant formula and causes infant septicemia. NCBI BioProject. PRJNA12720
TIGR 2003. Opportunistic pathogen that transfers vancomycin resistance to other bacteria. NCBI BioProject. PRJNA70
Baylor College of Medicine 2012. reference genome for the Human Microbiome Project. NCBI BioProject. PRJNA30627
University Iowa State 2006. Avian pathogenic strain. NCBI BioProject. PRJNA16718
Genentech 2020. Escherichia coli CFT073 isolate:199310 Genome sequencing. NCBI BioProject. PRJNA624646
University of Tokyo 2009. Enterohemorrhagic strain. NCBI BioProject. PRJDA32509
University of Tokyo 2009. This strain will be used for comparative genome analysis. NCBI BioProject. PRJDA32511
University of Tokyo 2009. This strain will be used for comparative genome analysis. NCBI BioProject. PRJDA32513
University of California San Diego 2014. Escherichia coli O157:H7 str. EDL933 Genome sequencing. NCBI BioProject. PRJNA253471
GIRC 2018. Enterohemorrhagic Escherichia coli. NCBI BioProject. PRJNA226
Genoscope 2008. Urinary tract infection isolate. NCBI BioProject. PRJNA33415
DOE Joint Genome Institute 2008. Causative agent of tularemia. NCBI BioProject. PRJNA19571
Los Alamos National Laboratory 2015. Francisella tularensis tularensis Schu_S4 Genome sequencing. NCBI BioProject. PRJNA239340
BioHealthBase 2007. Causative agent of tularemia. NCBI BioProject. PRJNA18459
Wuerzburg Univ 2003. Causes hepatitis, typhlitis, hepatocellular tumors, and gastric bowel disease. NCBI BioProject. PRJNA185
RIPCM 2012. Helicobacter pylori 26695 Genome sequencing. NCBI BioProject. PRJNA175543
Bielefeld University 2010. Helicobacter pylori B8 genome sequencing project. NCBI BioProject. PRJEA41831
The University of Tokyo 2011. Helicobacter pylori F16 genome sequencing project. NCBI BioProject. PRJDA50589
The University of Tokyo 2011. Helicobacter pylori F30 genome sequencing project. NCBI BioProject. PRJDA50591
The University of Tokyo 2011. Helicobacter pylori F32 genome sequencing project. NCBI BioProject. PRJDA50593
The University of Tokyo 2011. Helicobacter pylori F57 genome sequencing project. NCBI BioProject. PRJDA50595
Icahn School of Medicine at Mount Sinai 2015. Multi-strain, long-read bacterial genome sequencing. NCBI BioProject. PRJNA281410
Max von Pettenkofer-Institut für Hygiene und Medizinische Mikrobiologie, Ludwig-Maximilians-Universität München 2008. Clinical isolate. NCBI BioProject. PRJNA32291
Wuhan Institute of Virology, Chinese Academy of Sciences 2008. Mosquito larvae pathogen. NCBI BioProject. PRJNA19619
Microbial Genome Center of ChMPH 2007. Unknown strain. NCBI BioProject. PRJNA16393
TIGR 2005. Causes meningitis and septicemia. NCBI BioProject. PRJNA251
IREC 2010. Rhodococcus equi strain 103S whole genome sequencing project. NCBI BioProject. PRJEA41335
Sanger Institute 2008. An opportunistic pathogen in normal gut flora. NCBI BioProject. PRJNA12624
TIGR 2003. Causes plant rot. NCBI BioProject. PRJNA359
Chang Gung Genomic Medical Center, Chang Gung Memorial Hospital 2005. Extremely invasive Salmonella that causes severe disease in pigs and humans. NCBI BioProject. PRJNA9618
Sanger Institute 2003. Human-specific Salmonella that causes Typhoid fever. NCBI BioProject. PRJNA236
Wisconsin Univ 2003. Human-specific Salmonella that causes Typhoid fever. NCBI BioProject. PRJNA371
Washington University Genome Sequencing Center 2016. Major laboratory strain of Salmonella typhimurium. NCBI BioProject. PRJNA241
Microbial Genome Center of ChMPH 2011. Human-specific pathogen that causes endemic dysentery. NCBI BioProject. PRJNA310
Wisconsin Univ 2003. Human-specific pathogen that causes endemic dysentery. NCBI BioProject. PRJNA408
Minnesota Univ 2005. Associated with mastitis in cattle. NCBI BioProject. PRJNA63
IntegratedGenomics 2013. Staphylococcus aureus subsp. aureus CN1 Genome sequencing. NCBI BioProject. PRJNA162343
TIGR 2005. Methicillin resistant strain. NCBI BioProject. PRJNA238
University of Edinburgh 2009. Staphylococcus aureus ED98 genome sequencing. NCBI BioProject. PRJNA39547
University of Edinburgh 2010. Staphylococcus aureus subsp. aureus ED133 genome sequencing project. NCBI BioProject. PRJNA41277
Sanger Institute 2004. Methicillin resistant strain from the UK. NCBI BioProject. PRJNA265
Sanger Institute 2004. Methicillin sensitive strain from the UK. NCBI BioProject. PRJNA266
Univ Juntendo 2004. Methicillin and vancomycin resistant strain. NCBI BioProject. PRJNA263
NITE 2004. Methicillin resistant strain. NCBI BioProject. PRJNA306
Univ Juntendo 2004. Methicillin resistant strain. NCBI BioProject. PRJNA264
University Medical Centre Utrecht 2010. Staphylococcus aureus subsp. aureus ST398. NCBI BioProject. PRJEA29427
Juntendo University School of Medicine, Department of Bacteriology 2007. An opportunistic pathogen in humans and animals. NCBI BioProject. PRJDA18801
University of California, San Francisco 2006. A methicillin resistant strain of Staphylococcus aureus. NCBI BioProject. PRJNA16313
Chinese National HGC Shanghai 2002. Used for detection of residual antibiotics in food products. NCBI BioProject. PRJNA279
TIGR 2005. Pathogenic clinical isolate that causes toxic-shock syndrome and staphylococcal scarlet fever. NCBI BioProject. PRJNA64
The Wellcome Trust Sanger Institute 2009. Causes strangles disease. NCBI BioProject. PRJEA30765
Herz- und Diabeteszentrum Nordrhein-Westfalen Universitätsklinik der Ruhr-Universität Bochum 2011. Streptococcus gallolyticus subsp. galloyticus ATCC BAA-2069 genome sequencing. NCBI BioProject. PRJEA63179
Department ofMicrobiology University of Kaiserslautern 2010. Clinical isolate. NCBI BioProject. PRJNA16302
UniversityChang Gung 2012. Streptococcus parasanguinis FW213 Genome sequencing. NCBI BioProject. PRJNA76769
Wellcome Trust Sanger Institute 2009. multidrug resistant strain. NCBI BioProject. PRJEA31233
Institute Broad 2012. Genome sequencing with short reads. NCBI BioProject. PRJNA76613
Lab of Human Bacterial Pathogenesis 2005. Causative agent of a wide range of human and animal infections. NCBI BioProject. PRJNA13888
Lab of Human Bacterial Pathogenesis 2005. Causative agent of a wide range of human and animal infections. NCBI BioProject. PRJNA13887
Beijing Institute of Genomics, Chinese Academy of Sciences 2007. Causes disease in pigs and occasionally humans. NCBI BioProject. PRJNA17153
Beijing Institute of Genomics, Chinese Academy of Sciences 2007. Causes disease in pigs and occasionally humans. NCBI BioProject. PRJNA17155
Wellcome Sanger Institute 2019. Updated VC N16961 reference genome. NCBI BioProject. PRJEB22249
Centers for Disease Control and Prevention 2011. Vibrio cholerae O1 str. 2010EL-1786 genome sequencing project. NCBI BioProject. PRJNA59943
University of Oslo 2019. Vibrio cholerae O395 isolate:TCP2 Genome sequencing. NCBI BioProject. PRJNA586749
Sao Paulostate (Brazil) Consortium 2003. Plant-specific pathogen that causes citrus canker. NCBI BioProject. PRJNA297
Chinese National HGC Shanghai 2005. Causes black rot and citrus canker. NCBI BioProject. PRJNA15
Sao Paulostate (Brazil) Consortium 2002. Plant-specific pathogen that causes black rot. NCBI BioProject. PRJNA296
GenomicsBacterial. LaboratoryEvolution and CSIR-Institute of Microbial Technology, Sector 39-A, Chandigarh, India 2017. Xanthomonas campestris pv. vitistrifoliae strain:LMG940 Genome sequencing and assembly. NCBI BioProject. PRJNA298596
NIAB, Rural Development Administration 2005. Causes rice bacterial blight disease. NCBI BioProject. PRJNA12931
Sanger Institute 2007. Food and waterborn pathogen that causes gastroenteritis. NCBI BioProject. PRJNA190
Academy of Military Medical Sciences, The Institute of Microbiology and Epidemiology, China 2004. Extremely virulent organism that causes plague. NCBI BioProject. PRJNA10638
Sanger Institute 2003. Extremely virulent organism that causes plague. NCBI BioProject. PRJNA34
J. Craig Venter Institute 2009. Yersinia pestis KIM D27 genome sequencing project. NCBI BioProject. PRJNA41469
TIGR 2007. Serotype 1b strain isolated from a patient in Russia. NCBI BioProject. PRJNA16070
Los Alamos National Laboratory 2015. Yersinia pseudotuberculosis IP 32953 Genome sequencing. NCBI BioProject. PRJNA239344
The Wellcome Trust Sanger Institute 2011. Staphylococcus aureus subsp. aureus MSHR1132 genome sequencing project. NCBI BioProject. PRJEA62885

Supplementary Materials

Supplementary file 1. The genomic features of pathogens and pathogenic islands, as well as the putative functions of G4s across two E. coli strains.

elife-91985-supp1.xlsx^{(89.6KB, xlsx)}

MDAR checklist

elife-91985-mdarchecklist1.docx^{(42.6KB, docx)}

Data Availability Statement