Abstract
G-quadruplex (G4), a four-stranded DNA or RNA structure containing stacks of guanine tetrads, plays regulatory roles in many cellular functions. So far, conventional G4s containing loops of 1–7 nucleotides have been widely studied. Increasing experimental evidence suggests that unconventional G4s, such as G4s containing long loops (long-loop G4s), play a regulatory role in the genome by forming a stable structure. Other secondary structures such as hairpins in the loop might thus contribute to the stability of long-loop G4s. Therefore, investigation of the effect of the hairpin-loops on the structure and function of G4s is required. In this study, we performed a systematic biochemical investigation of model G4s containing long loops with various sizes and structures. We found that the long-loop G4s are less stable than conventional G4s, but their stability increased when the loop forms a hairpin (hairpin-G4). We also verified the biological significance of hairpin-G4s by showing that hairpin-G4s present in the genome also form stable G4s and regulate gene expression as confirmed by in cellulo reporter assays. This study contributes to expanding the scope and diversity of G4s, thus facilitating future studies on the role of G4s in the human genome.
INTRODUCTION
G-quadruplexes (G4s) are non-B DNA structures characterised by the presence of stacked G-quartets, with each G-quartet containing four guanines bonded by Hoogsteen base pairs in a planar structure (1). G4s are formed from a single DNA strand in which consecutive G-repeats (G-runs) are separated by short oligonucleotides (≥1 nucleotide). Upon folding into G4s, the G-runs form G-quartets and the intervening short oligonucleotides form loops that connect the G-quartets. Based on these loops, G4s can adopt a variety of folding topologies (2). In the case of a parallel intramolecular G4, the top and bottom G-quartets are connected by a propeller or chain-reversal loop. Previous reports showed that the stability and conformational polymorphism of G4 DNA are governed by the length and nucleotide composition of the loop (3). When G4s have a loop 1–10 nucleotides (nt) in length, G4s adopt a parallel conformation (4–6). Based on these rules, several prediction tools have been developed to identify potential G4-forming sequences (7–9). Most of these tools use the default of schema G3N(1–10)G3N(1–10)G3N(1–10)G3 to search for G4s in any given sequence.
Many recent studies have indicated that loops that are longer than 10 nt can also form parallel G4s (10–13). High-throughput sequencing of G4s (G4-Seq) was performed based on the principle that G4s can act as physical barriers for polymerase progression (14). By performing the sequencing under G4-stabilizing conditions, the polymerase pause sites were found to overlap with ∼700,000 G4-forming sequences, with over 70% consisting of long loops (>10 nt) and bulges (14). The study of G4s with longer loops is further complicated by the abundance of nested secondary structures such as hairpins (15). The ability of hairpins to form in short lengths of single stranded DNA and the prevalence of inverted repeats in the human genome (16) make hairpins the most likely type of secondary structure that are easily formed within loop regions of G4; these can also be called hairpin-forming loops. Considering the evidence that G4s containing a hairpin in the loop (hairpin-G4s) are present in the genome (15), intensive and systematic investigation on hairpin-G4s are necessary for the comprehensive understanding of G4s function in the genome.
A previous report proposed that hairpin-G4s are more stable than G4s containing unstructured loops based on a thermal stability study (13). However, this study only considered the 6 nt-stem region for the long loop of the hairpin-G4 and investigated the effect of the hairpin-forming loop present in the central position (loop 2). Subsequent studies revealed that genome-wide extraction of the putative hairpin-G4s from the human genome was carried out using the schema G(3–6)N(1–20)G(3–6)N(1–20)G(3–6)N(1–20)G(3–6) because they observed that the 20 nt loop length represented 45% of all hairpin-G4s in chromosome 1 (15). In this genome-wide search, 48,508 hairpin-G4 sequences in the promoter regions of 12 315 genes were found. However, considering the correlation between the stability and the number of base pairs in the hairpin, it is plausible to speculate that a loop longer than 20 nt might form more stable hairpins, which contribute to the structure and function of the long-loop G4s. Therefore, it is necessary to more systematically investigate G4s with long loops, including G4s with loops longer than 20 nt, for the comprehensive understanding of the role of non-canonical DNAs in the genome.
In the current study, we extend the existing knowledge on hairpin-G4s (Figure 1) by evaluating long-loop G4s with various loop sizes (13, 23 and 33 nt) and examining the effect of hairpin formation on G4 stability and function. For this purpose, we designed model hairpin-G4s containing one hairpin with various sizes and structures. We used circular dichroism (CD) and nuclear magnetic resonance (NMR) studies to analyse the effect of hairpin-forming propensity, size of the stem-flanking region, and location of the hairpin on hairpin-G4 formation and thermal stability. For identification of hairpin formation in the G4 structure, we developed a SYBR Green binding assay that differentiates the hairpin-loop from the unstructured-loop. Finally, we searched the human genome for hairpin-G4–forming sequences in promoter regions and evaluated their functional effects on gene regulation through in cellulo reporter assays and in vitro polymerase stop assays.
Figure 1.
Schematic representation of a conventional G4 structure (A) and a hairpin-G4 (B). The hairpin-G4 is characterized by the coexistence of a hairpin and G4 in the same structure, unlike the conventional G4.
MATERIALS AND METHODS
Oligonucleotides used for this study
All oligonucleotides were purchased from Cosmo Genetech and Macrogen. The oligonucleotides used in this study are listed in Table 1.
Table 1.
List of oligonucleotides used in this study
| Sample name | Sequence | Length |
|---|---|---|
| SH-G4 | TGGGTGGGTGGGTTGTCGGCGACATGGGT | 29 |
| SI-1-G4 | TGGGTGGGTGGGTTGTCGGCGACTTGGGT | 29 |
| SI-2-G4 | TGGGTGGGTGGGTTCTCGGCGACTTGGGT | 29 |
| MH-G4 | TGGGTGGGTGGGTTGTCAGTATGGCATACTGACATGGGT | 39 |
| MI-1-G4 | TGGGTGGGTGGGTTGTCAGAATGGCATACTGACATGGGT | 39 |
| MI-2-G4 | TGGGTGGGTGGGTTATCAGTATGCCACACTGACATGGGT | 39 |
| LH-G4 | TGGGTGGGTGGGTTGTCAGTATAGTCTGGCAGACTATACTGACATGGGT | 49 |
| LI-1-G4 | TGGGTGGGTGGGTTGTGAGTATAGACTGGCAGACTATACTGACATGGGT | 49 |
| LI-2-G4 | TGGGTGGGTGGGTGGTCTGTGTAGACTGGCGGAATATACTGACATGGGT | 49 |
| SU-G4 | TGGGTGGGTGGGTCTTCTTACATATGGGT | 29 |
| MU-G4 | TGGGTGGGTGGGTCTTCTTATATATTCTTCTTACAGGGT | 39 |
| LU-G4 | TGGGTGGGTGGGTCTTCTTACTTATTCTTCTTACTTATTCTTCTTGGGT | 49 |
| SU-G4 LOOP ONLY | TCTTCTTACATAT | 13 |
| MU-G4 LOOP ONLY | TCTTCTTATATATTCTTCTTACA | 23 |
| LU-G4 LOOP ONLY | TCTTCTTACTTATTCTTCTTACTTATTCTTCTT | 33 |
| SH-G4 LOOP ONLY | TTGTCGGCGACAT | 13 |
| MH-G4 LOOP ONLY | TTGTCAGTATGGCATACTGACAT | 23 |
| LH-G4 LOOP ONLY | TTGTCAGTATAGTCTGGCAGACTATACTGACAT | 33 |
| Pu22myc | TGAGGGTGGGTAGGGTGGGTAA | 22 |
| CHST1-G4 | CGGGTGGGTGGGGGCGGGCTCCGGAGCCTGGCTGCGGAGTGGGT | 44 |
| CHST1-MutG4 | CGGGTGGGTGGGGGCAAAAAAAAAAAAAAAAAAAAAAAGTGGGT | 44 |
| MCM4-G4 | TGGGTGGGTACCGGCCCGAGCTGGGCCGCGGGTGGGT | 37 |
| MCM4-MutG4 | TGGGTGGGTAAAAAAAAAAAAAAAAAAGCGGGTGGGT | 37 |
| MCM4-G4NGT | TGGGTGGGTACCGCGCCGAGCTGGCGCGCGGGTGGGT | 37 |
| MSI1-G4 | AGGGCGTTCCCGCGGCCGGGCCCCCGCGCCGGGGTGGGTGGGG | 43 |
| MSI1-MutG4 | AGGGCGAAAAAAAAAAAAAAAAAAAAAAAAGGGGTGGGTGGGG | 43 |
| NRBP1-G4 | TGGGTGGGCGGGGCCCGGCCCTCGGGCGTTCGCTGGGGTGGGC | 43 |
| NRBP1-MutG4 | TGGGTGGGCGGGGAAAAAAAAAAAAAAAAAAAAAGGGGTGGGC | 43 |
| TMCC3-G4 | GGGGGTGGGTGGGGGGTCCAGGCGGCTGCGGGGCGCGGGA | 40 |
| TMCC3-MutG4 | GGGGGTGGGTGGGGGGTCCAAAAAAAAAAAAAAAACGGGA | 40 |
| L5′TH-G4 | TGGGTGGGTGGGTTTGTCAGTATAGTCTGGCAGACTATACTGACATGGGT | 54 |
| L3′TH-G4 | TGGGTGGGTGGGTTGTCAGTATAGTCTGGCAGACTATACTGACATTGGGT | 54 |
| L5′TTH-G4 | TGGGTGGGTGGGTTTTGTCAGTATAGTCTGGCAGACTATACTGACATGGGT | 54 |
| L3′TTH-G4 | TGGGTGGGTGGGTTGTCAGTATAGTCTGGCAGACTATACTGACATTTGGGT | 54 |
| LTTTH-G4 | TGGGTGGGTGGGTTTTTGTCAGTATAGTCTGGCAGACTATACTGACATTTTGGGT | 55 |
| LTH-G4 | TGGGTGGGTGGGTTTGTCAGTATAGTCTGGCAGACTATACTGACATTGGGT | 54 |
| LTTH-G4 | TGGGTGGGTGGGTTTTGTCAGTATAGTCTGGCAGACTATACTGACATTTGGGT | 54 |
| L5′TTTH-G4 | TGGGTGGGTGGGTTTTTGTCAGTATAGTCTGGCAGACTATACTGACATGGGT | 54 |
| L3′TTTH-G4 | TGGGTGGGTGGGTTGTCAGTATAGTCTGGCAGACTATACTGACATTTTGGGT | 54 |
| LU1-G4 | TGGGTCTTCTTACTTATTCTTCTTACTTATTCTTCTTGGGTGGGTGGGT | 54 |
| LU2-G4 | TGGGTGGGTCTTCTTACTTATTCTTCTTACTTATTCTTCTTGGGTGGGT | 54 |
| LU-G4 | TGGGTGGGTGGGTCTTCTTACTTATTCTTCTTACTTATTCTTCTTGGGT | 54 |
| LH1-G4 | TGGGTTGTCAGTATAGTCTGGCAGACTATACTGACATGGGTGGGTGGGT | 54 |
| LH2-G4 | TGGGTGGGTTGTCAGTATAGTCTGGCAGACTATACTGACATGGGTGGGT | 54 |
| LH-G4 | TGGGTGGGTGGGTTGTCAGTATAGTCTGGCAGACTATACTGACATGGGT | 54 |
Prediction of hairpin structure and free energy (ΔGh)
The predicted structure and thermodynamic parameters (ΔGh) for the loops with hairpin-forming regions were obtained using the UNAFold server (http://unafold.rna.albany.edu/?q=DINAMelt/Quickfold) under 100 mM Na+ and 1 mM Mg2+ at 37°C.
Circular dichroism (CD) spectroscopy
The CD experiments were performed on a Jasco J-810 CD spectropolarimeter fitted with a Jasco CDS-426F Peltier temperature controller and a 1 mm quartz cuvette (Hellma) with a 220 μl reaction volume. Oligonucleotides (15 μM) were suspended in a buffer containing 10 mM HEPES (pH 7.5) with 100 mM KCl and varying concentrations of MgCl2 (1 and 10 mM), followed by denaturation at 95°C for 5 min. After cooling to room temperature over 2 h, all spectra were measured between 220–320 nm with a speed of 100 nm/min, 1 nm data pitch, and a bandwidth of 1 nm at a constant temperature of 298 K. The readings of three accumulations were averaged and recorded. For CD thermal melting analysis, the samples were heated from 15 to 90°C at a rate of 0.2°C/min and the ellipticity was recorded at the wavelength showing maximum ellipticity every 0.2°C. After subtracting the spectrum of the buffer, the data was converted to the fraction folded and the melting temperature (Tm) was obtained at the point where 50% of the fraction was in the unfolded state. Data plotting and curve fitting were performed on Sigma Plot 12.5.
SYBR Green analysis of hairpin-G4s
For fluorescence spectrum measurement, 5 μl of 100 μM pre-formed G4 DNA in buffer containing 10 mM HEPES (pH 7.5) with 100 mM KCl and 1 mM MgCl2 was mixed with 15 μl of 1× SYBR Green I dye for a final DNA concentration of 25 μM. Fluorescence spectra were recorded in 96-well black bottom plates between 505–600 nm after excitation at 475 nm on a Biotek Synergy Neo multiplate reader.
Electrophoretic mobility analysis
Electrophoretic mobility of hairpin-G4s and unstructured-loop controls was analyzed on 15% polyacrylamide gels run under native conditions. The G4s were pre-formed for native PAGE at a concentration of 1μM in buffer containing 10 mM HEPES pH 7.5, 100mM KCl and 1mM MgCl2 by heating at 95°C for 5 min and allowing to cool to room temperature over 2 h. 5 μl of preformed G4 was loaded with 1 μl of 6× OrangeRuler Loading Dye on a 15% Native PAGE gel supplemented with 100 mM KCl and run at 80 V for 1 hour at 4°C in 1× TBE. GeneRuler Ultra-Low Range 10 bp ladder (Thermo Scientific) was used for checking the migration. After running the gels were stained using SYBR Gold and imaged on a BioRad GelDoc imaging system. The PAGE gel and 1X TBE buffer were supplemented with 100 mM KCl to maintain the secondary structure of DNA. The gels were pre-equilibrated for 30 min and samples were electrophoresed at 8 V/cm at 4°C for 1.5 h. Post-electrophoresis staining was carried out using SYBR Gold for 15 min and imaged on a Bio-Rad XRF Gel Doc system.
1H nuclear magnetic resonance (NMR) analysis
1H NMR experiments were performed at 25°C on a 400-Mhz Bruker Avance 400 spectrometer fitted with a general-purpose probe in buffer containing in 100 mM KCl and 1 mM MgCl2 in 10 mM HEPES (pH 7.5) and 10% D2O. The oligonucleotide concentration was set to 0.5 mM and 50 μM DSS was added for referencing the spectra. For temperature-dependent NMR experiments, spectra were recorded at 10°C increments from 20 to 80°C. All spectra were processed using Topspin 3.5 (Bruker).
Prediction of hairpin-G4 formation in the human genome
Step 1
We searched the long-loop G4s with one long loop (10nt < loop length < 40nt) and two short loops with one nt each (A/T/C/G) using quadparser (17) in the promoter regions, defined as the regions −500 to +100 bp from the transcription start sites (TSS) present in the hg19 human genome. To observe the distribution of the G4s depending on the loop size, we subdivided the long-loop G4s based on their loop size by 10nt steps. G4 with loop length 1–10 was also included for comparison. Accordingly, we applied the following schema for searching the G4s in the promoter regions of human genome: G(3)N(1–10/11–20/21–30/31–40)G(3)N(1)G(3)N(1)G(3),G(3)N(1)G(3) N(1–10/11–20/21–30/31–40)G(3)N(1)G(3), and G(3)N(1)G(3)N(1)G(3) N(1–10/11–20/21–30/31–40)G(3).
Step 2
Shortlisting of long-loop G4s in step 1 by overlapping with experimentally identified G4s (14).
Step 3
Since we tested long-loop G4s with ‘thymine’ in the short loop position and loop length between 13 and 33 in our model study, we further restricted H-G4s with ‘T’ in the short loop site and with loop length less than or equal to 33nt.
Step 4
Identifying the hairpin-G4s (H-G4s) from the long-loop G4s in Step 3 based on the hairpin folding prediction. Hairpin formation was predicted based on the secondary structure and ΔGh value calculated from the QuikFold tool in the UNAFold server (http://unafold.rna.albany.edu/?q=DINAMelt/Quickfold) (18) under the condition of 100 mM Na+ and 1 mM Mg2+ by considering that the loop forms a stable hairpin when ΔGh value is lower.
Step 5
Five H-G4s (MCM4, MSI1, TMCC3, CHST1 and NRBP1) were chosen based on the biological relevance and secondary structure among the H-G4s chosen in Step 3.
Construction of reporter vectors for in cellulo expression analysis
Candidate sequences containing hairpin-forming loops and their respective non-hairpin–forming mutants were cloned in the pGL3-Promoter vector. Two types of vectors were constructed: one to analyze the cis-acting effect of the sequences and the other to analyse the in cellulo polymerase-stalling ability. For the cis-acting effect, the hairpin-G4s and control sequences were cloned in the XhoI/NheI sites upstream of the SV40 promoter. To analyse the in cellulo stability-dependent effect, the sequences were cloned into the antisense strand between the HindIII/NcoI sites between the SV40 promoter and luciferase gene based on the rationale that G4 formation can only affect transcription by polymerase stalling (19). A single stranded oligonucleotide (ODN) corresponding to the hairpin-G4 forming sequence and its complementary strand were synthesized and annealed to form double-stranded DNA, which was then cloned into the pLG3-Promoter vector to construct the reporter plasmids. The pRL-TK vector expressing Renilla luciferase was used as internal control. We transfected 900 ng of reporter plasmid along with 100 ng of internal control plasmid using Turbofect (Thermo Scientific) into 3 × 105 HEK293T cells seeded 24 h earlier in six-well plates. After 24 h of transfection, the cells were harvested, and luciferase activity was measured using the Dual-Luciferase Reporter Assay Kit (Promega) according to the manufacturer's instructions. Firefly luciferase reporter expression (FLuc) was first normalised to Renilla expression (RLuc) and the FLuc/RLuc ratio was further normalised to the activity from the mutated (G4 without hairpin) vector.
In vitro T7 RNA polymerase stop assay
The T7 RNA polymerase stop assay was performed using the protocol based on the study by Tateishi-Karimata et al. (20). Oligonucleotides (Supplementary Table S2) containing the T7 polymerase binding site and a 35nt-spacer followed by the G4 forming sequence were synthesized and annealed with the complementary strand to form a dsDNA T7 polymerase binding site while leaving the spacer and G4-forming site single-stranded. The list of oligonucleotides used for this study is listed in Supplementary Table S2. The reaction was performed by incubating T7 polymerase and the annealed oligonucleotides for 10 mins in 1× T7 Polymerase Reaction Buffer (NEB), followed by addition of rNTPs and further incubation at 37°C for 90 mins. The reaction was quenched using Stop Buffer (80% formamide, 10mM MgCl2 and 0.01% Dextran Blue). The samples were heated for 5 min at 95°C before being loaded on a 10% denaturing PAGE gel with 7M urea and electrophoresed for 45 min at 60°C and stained with SYBR Gold stain (Thermo Scientific). The gels were imaged on the Bio-Rad XRF Gel Doc system.
Statistical analysis
Significance was calculated using one-sample t-tests by setting the theoretical mean as 1 for the unstructured loop mutants on GraphPad Prism 6. P-values were indicated based on the program *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001.
RESULTS
Design of model hairpin-G4s: S(U/I/H)-G4s, M(U/I/H)-G4s, and L(U/I/H)-G4s
In our quest to establish whether the formation of parallel hairpin-G4s with long loops was feasible, we designed model hairpin-G4s with the sequence ‘TG3TG3TG3X(13/23/33)G3T’ in which the third loop (X) forms a hairpin. Three loop lengths were chosen for this purpose and the hairpin-G4s were categorised accordingly as short (S, 13 nt), medium (M, 23 nt), and long (L, 33 nt) loops (Supplementary Figure S1). To evaluate the effect of the hairpin stability on hairpin-G4s, we additionally varied the propensity of hairpin formation in three categories based on the predicted free energy of hairpin formation (ΔGh) (Table 2) from the UNAFold server (http://unafold.rna.albany.edu/?q=mfold/DNA-Folding-Form) (18,21). The UNAFold server has been shown to predict the secondary structure of nucleic acids and their free energy very accurately (18,21,22) and is especially useful considering our aim to screen H-G4s from the human genome on a large scale. Accordingly, hairpin-G4s were further characterised as low or unstructured (U-G4s), intermediate (I-G4s), and high (H-G4s). H-G4s contain perfect base pairs in the stem region of the hairpin and have the lowest ΔGh values compared with U-G4s and I-G4s. U-G4s have the highest ΔGh values among the model hairpin-G4s since no hairpin is formed in the loop. I-G4s show higher ΔGh values than H-G4s due to the presence of the mismatches in the stem region. For the purpose of evaluating the effect of hairpin stability in the wider range, we designed two I-G4s with different ΔGh: I-1 has a lower ΔGh than I-2. In total, 12 model hairpin-G4s with loops with different lengths and hairpin-forming propensities were designed: 13 nt short-loop G4s (SU-G4, SI-1-G4, SI-2-G4 and SH-G4), 23 nt medium-loop G4s (MU-G4, MI-1-G4, MI-2-G4, and MH-G4), and 33 nt long-loop G4s (LU-G4, LI-1-G4, LI-2-G4, and LH-G4) (Table 1).
Table 2.
Stem–loop details and loop free energies in model hairpin-G4s and unstructured loop mutants
| Sample name | Loop sequence | Loop length | Predicted* ΔGh | Predicted* Tm(37°C) | Stem length | Hairpin loop length |
|---|---|---|---|---|---|---|
| SH-G4 | TTGTCGGCGACAT | 13 | –0.21 | 39.6 | 2 | 5 |
| SI-1-G4 | TTGTCGGCGACTT | 13 | 0.09 | 35.9 | 2 | 5 |
| SI-2-G4 | TTCTCGGCGACTT | 13 | 1.38 | –3.4 | 1 | 5 |
| MH-G4 | TTGTCAGTATGGCATACTGACAT | 23 | –5.14 | 63.9 | 7 | 5 |
| MI-1-G4 | TTGTCAGAATGGCATACTGACAT | 23 | –2.7 | 54.8 | 5 | 9 |
| MI-2-G4 | TTATCAGTATGCCACACTGACAT | 23 | –1.21 | 47.7 | 5 | 7 |
| LH-G4 | TTGTCAGTATAGTCTGGCAGACTATACTGACAT | 33 | –9.94 | 69.7 | 12 | 5 |
| LI-1-G4 | TTGTGAGTATAGACTGGCAGACTATACTGACAT | 33 | –3.23 | 51.2 | 10 | 5 |
| LI-2-G4 | TGGTCTGTGTAGACTGGCGGAATATACTGACAT | 33 | –1.38 | 49.4 | 4 | 4 |
| SU-G4 | TCTTCTTACATAT | 13 | No folding | Not Determined | 0 | 0 |
| MU-G4 | TCTTCTTATATATTCTTCTTACA | 23 | 2.94 | –47.9 | 1 | 9 |
| LU-G4 | TCTTCTTACTTATTCTTCTTACTTATTCTTCTT | 33 | 2.63 | –21.5 | 2 | 11 |
*Predictions were carried out using UNAFold DNA-folding form server with parameters 100 mM Na+ and 1 mM Mg2+ at 37°C.
Correlation between the stabilities of the hairpin in the loop and the hairpin-G4
CD spectroscopy is a technique widely used to characterise G4 structures. G4s exhibit characteristic peaks between the 220 and 320 nm regions in the CD spectrum (23). Parallel G4s exhibit a peak at ∼260 nm and a trough at ∼240 nm, as evidenced by the CD spectra of the parallel G4 pu22myc from the C-MYC promoter, which was used as a control (23,24). The CD spectra of model G4s revealed that model G4s with high or intermediate propensity hairpin-forming loops (H-G4s or I-G4s) form parallel-stranded G4s with the 260 and 240 nm peaks (Figure 2A). However, in the case of U-G4s, 1–4 nm spectral shifts were observed, suggesting that the G4 structure is perturbed, possibly due to the increased flexibility of the loop. Consistently, the spectral shifts were more pronounced in MU-G4s and LU-G4s with longer unstructured loops (23 and 33 nt) than SU-G4 (13 nt). We also investigated whether there is any contribution of the long loops to the CD spectra of the long-loop G4 by observing the CD spectra of the loop region only (Supplementary Figure S2). While the spectral signature was found near 250 and 280 nm, there was no CD peak at 262 nm and even ellipticity was almost zero, representing the presence of the unstructured or hairpin DNA (23,25). Therefore, we can propose that the CD spectra of the current model G4s are characteristic of G4 formation.
Figure 2.
Hairpin-G4s can form stable G4s with higher stability than the mutants with unstructured loops. (A) CD spectra of short (S), medium (M) and long (L) loop-G4s with perfect hairpin loop (H-G4s), intermediate hairpin loop (I1 and I2-G4s) and unstructured loop (U-G4s) showed characteristic G4 peaks at 240 and 260nm. (B) CD thermal melting analysis showed that G4s with hairpin loops (H/I-G4s) are more stable than the G4 with unstructured loop (U-G4s) in varying Mg2+ concentrations. LI-2-G4 was extremely stable at 10mM MgCl2, and hence the melting temperature could not be determined. All measurements were taken in 10 mM HEPES buffer (pH 7.5) containing 100 mM KCl and 1 mM MgCl2. The ramp rate for CD melting experiments was set to 0.2°C/min. Error bars represent ± SD.
Next, we performed CD thermal melting analysis, which is widely used for evaluating the stability of the G4 structure (26). We measured the Tm by monitoring the ellipticity change at the 262 nm between 15 and 90ºC (Supplementary Figure S3). Regardless of the length of the loop, the Tm of H-G4s was higher than that of U-G4s and I-G4s (Figure 2B and Table 3). The difference in Tm was more pronounced in the hairpin G4s with long (LH-G4 versus LU-G4 or LI-G4) and medium (MH-G4 vs. MU-G4 or MI-G4) sized loops than hairpin G4s with short loops (SH-G4 vs. SU-G4). We compared the predicted ΔGh values of the hairpin with Tm values of hairpin-G4s (Table 3) to examine the correlation between hairpin-G4 stability and hairpin stability. Among the model G4s, H-G4s with the lowest ΔGh values showed the highest Tm (SH-G4, MH-G4, and LH-G4), which is expected since the predicted hairpin structures of H-G4s contain perfect base pairing in the stem region (Supplementary Figure S1). However, very little correlation between ΔGh and Tm was found in I-G4s and U-G4s, and their Tm values were lower than those of H-G4s. These results suggest that hairpin formation significantly contributes to the stability of hairpin-G4s only when the hairpin contains perfect base pairing, and hairpins with mismatches have lower propensity to stabilize hairpin-G4s. Comparison of the melting and cooling curves of the model G4s showed that the curves were mostly superimposable except in the case of LU-G4 and LH-G4 (Supplementary Figure S4), which show slight hysteresis of 4–5°C.
Table 3.
Comparison between unstructured loop and hairpin-forming loop ΔGh and hairpin-G4 and unstructured-loop mutant CD Tm
| Sample name | Predicted loop ΔGh | CD Tm (°C) |
|---|---|---|
| SU-G4 | No folding | 69.48 ± 1.17 |
| SH-G4 | –0.21 | 71.39 ± 1.37 |
| SI-1-G4 | 0.09 | 69.42 ± 0.14 |
| SI-2-G4 | 1.38 | 68.67 ± 0.09 |
| MU-G4 | 2.94 | 65.34 ± 0.03 |
| MH-G4 | –5.14 | 69.71 ± 0.04 |
| MI-1-G4 | –2.7 | 66.11 ± 0.28 |
| MI-2-G4 | –1.21 | 63.27 ± 0.08 |
| LU-G4 | 2.63 | 62.97 ± 0.29 |
| LH-G4 | –9.94 | 68.80 ± 0.28 |
| LI-1-G4 | –3.23 | 63.55 ± 0.50 |
| LI-2-G4 | –1.38 | 66.26 ± 0.01 |
CD experiments were performed in buffer containing 100 mM KCl and 1 mM MgCl2 in 10 mM HEPES (pH 7.5). ΔGh was predicted using UNAFold DNA-folding form server with parameters 100 mM Na+ and 1 mM Mg2+ at 37°C. The ramp rate of 0.2°C/min was used for CD melting.
We also performed temperature-dependent CD measurement similar to the NMR melting studies to evaluate the impact of hairpin formation on the stability of hairpin-G4 structure (Supplementary Figure S5). Our studies showed that the characteristic G4 spectrum of hairpin-G4 (LH-G4) is retained until 60°C, followed by an abrupt decrease in ellipticity. In comparison, its long-loop G4 with unstructured loop (LU-G4), showed a gradual decrease in ellipticity and spectral shift at lower temperature compared to LH-G4. A similar trend was observed in the G4s containing medium (MH-G4 versus MU-G4) and short (SH-G4 versus SU-G4) loops. This indicates that the formation of the hairpin could possibly contribute to the higher thermal stability of hairpin-G4s in contrast to their counterparts with unstructured loops.
Effect of magnesium Ions on hairpin-G4 stability
Hairpin-G4s have both G4 and a hairpin that contains a dsDNA stem region (Figure 1B). Since Mg2+ can stabilize dsDNA (27) as well as G4 (22,23), we tested the effect of Mg2+ on the stability of the hairpin-G4s by measuring the thermal stability of hairpin-G4s in the presence of 1 and 10 mM MgCl2. As expected, we found that the hairpin-G4 stability represented by the Tm values was directly proportional to the concentration of Mg2+ in the buffer (Figure 2B and Supplementary Figure S3). However, the correlation between Tm and ΔGh was consistently observed even in the presence of Mg2+. For example, H-G4s showed the highest stability, while the stability of I-G4 and U-G4 were similar regardless of Mg2+ (Figure 2B). Interestingly, in the case of U-G4s, thermal stability was inversely proportional to the length (SU-G4 ≥ MU-G4 > LU-G4), which could be explained by the longer unstructured loop that enhances G4 instability. This stability change became more obvious in the presence of Mg2+ (SU-G4 > MU-G4 > LU-G4) (Figure 2B), which is consistent with the previous reports in which Mg2+ increases the flexibility of a single-stranded DNA loop (28,29) and becomes detrimental to G4 stability (30). In the case of the C-MYC G-quadruplex (pu22myc), we did not observe any drastic change in thermal stability upon addition of 1 mM MgCl2 (Supplementary Figure S6). However, under 10 mM MgCl2, there was significant stabilization of pu22myc (Supplementary Figure S6), which corroborates previous reports that showed stabilization of promoter quadruplexes under high concentrations of Mg2+ (31). This demonstrates that although Mg2+ stabilizes the G4 structure itself, the contribution of Mg2+ to the stability of the hairpin region of the hairpin-G4 is more significant, especially in the case of MH-G4 and LH-G4 (Figure 2B).
Effect of length of the stem-flanking region on hairpin-G4 stability
The stem-flanking region of the hairpin-forming loop connects the hairpin structure with G4. Therefore, the intriguing question is whether the length of the stem-flanking region influences the stability of hairpin-G4s. Answering this question is necessary to identify the stable hairpin-G4s in the genome. In previous reports, the effect of the stem-flanking region was evaluated using a model hairpin-G4 with a hairpin-forming loop with a 6-bp stem (13). In this study, there was no significant effect of the flanking region on thermal stability of the hairpin-G4. We extended this study to hairpins with a longer stem (13-bp) by further systematic evaluation of the effect of the flanking region on the stability of hairpin-G4 with a longer loop. For this purpose, we introduced new model hairpin-G4s that contain one (LTH-G4), two (LTTH-G4), or three (LTTTH-G4) thymines on both ends of the hairpin stem, only on the 5′ end (L5′TH-G4, L5′TTH-G4, L5′TTTH-G4), or only on the 3′ end (L3′TH-G4, L3′TTH-G4, L3′TTTH-G4) (Table 1). The CD spectra showed no significant deviation from the parallel G4 structure in all cases (Figure 3A). We evaluated the Tm from CD melting curves and found that there was no significant impact of the stem flanking region on the thermal stability of H-G4 structure (Supplementary Figure S7A). The maximum ΔTm between LH-G4 and LH-G4 with extended flanking regions (LTH-G4/ LTTH-G4/ LTTTH-G4 and their 5′ and 3′ counterparts) were only ∼2.5ºC in the presence of 1mM Mg2+ (Figure 3B). The maximum ΔTm between the equally distributed (LTH-G4/ LTTH-G4/ LTTTH-G4) and unequally distributed (L5′TH-G4/L3′TH-G4, L5′TTH-G4/L3′TTH-G4 and L5′TTTH-G4/L3′TTTH-G4) flanking regions was only ∼2.8ºC in the presence of 1 mM Mg2+ (Figure 3B), indicating that the distribution of the flanking nucleotides in the hairpin does not significantly affect the stability of the hairpin-G4.
Figure 3.
Position of hairpin-forming long loop and hairpin-flanking region do not affect the stability of hairpin-G4s. CD spectra (A) and Tm (B) of model hairpin-G4s containing hairpins at different loop positions. CD spectra (C) and Tm (D) of model hairpin-G4s with different hairpin-flanking regions. G4s with unstructured loops (LU-G4s) have been tested for comparison. All measurements were taken in 10 mM HEPES buffer (pH 7.5) containing 100 mM KCl and 1 mM MgCl2. The ramp rate for CD melting experiments was set to 0.2°C/min. Error bars represent ± SD (standard deviation) of two trials.
Effect of hairpin-forming loop position on hairpin-G4 stability
In previous studies, the effect of the hairpin-forming loop was only evaluated by placing it in the centre of the G4-forming sequence (13,15). For more comprehensive understanding of the stability and function of hairpin-G4s, we checked whether the position of the hairpin-forming loop has any effect on G4 formation and stability by evaluating the CD spectra and Tm of the model G4s with a hairpin at alternative positions in comparison with LH-G4, in which the hairpin is present in the third loop. For this purpose, we designed two long-loop G4s, LH1-G4 and LH2-G4, which have a hairpin at the first and second loop positions, respectively (Figure 3C and D). Additionally, we made two more long-loop G4s that have unstructured loops at the first and the second positions (LU1-G4, and LU2-G4, respectively). The CD spectra showed that they all formed the parallel G4 regardless of the position of the loop (Figure 3C). The thermal melting curves (Supplementary Figure S7B) demonstrated that the difference between Tm values of G4s with loops at different positions (LH-G4 vs. LH1-G4 or LH2-G4; LU-G4 vs. LU1-G4 or LU2-G4) was near ∼2°C regardless of the formation of hairpin in the loop (Figure 3D). These results indicated that irrespective of the position of the unstructured or hairpin loop, the G4 structure was well maintained with similar stability. Importantly, this widens the scope for genome-wide searches for G4s with a hairpin-forming loop.
Detection of hairpin-G4s using SYBR Green I fluorescence
Several fluorescent molecules have been developed as probes for the evaluation of G4 formation and differentiating G4 topologies (32,33). However, for the study of the hairpin in the hairpin-G4s, it is necessary to find a probe for the detection of hairpin formation. SYBR Green I (SG) is a DNA-binding dye that shows increased fluorescence upon binding to double stranded DNA (dsDNA) (34,35). Since SG preferentially binds to the dsDNA in the stem region of the hairpin but has a weak binding affinity to G4s (36,37), we speculated that SG can be used as a probe to detect hairpin formation in the long-loop hairpin-G4s (Figure 4A). To test this possibility, we examined the fluorescence enhancement upon binding of SG to the hairpin-G4. The SG fluorescence analysis revealed that the fluorescence from H-G4s was higher than that from U-G4s (Figure 4B). To evaluate the influence of SG binding to G4 on the fluorescence, we analysed the fluorescence enhancement upon SG binding to G4 present in the C-MYC promoter region (pu22myc), which does not contain any long loop or hairpin (38,39). Since pu22myc showed negligible fluorescence enhancement upon SG treatment (Figure 4B), we concluded that fluorescence enhancement observed in LH-G4 mostly originated from the binding of SG to the hairpin region. Additionally, we observed that the fluorescence enhancement was proportional to the number of base pairs in the hairpin stem (Supplementary Figure S1). Accordingly, the difference in fluorescence from G4s containing hairpins and unstructured loops was mostly prominent in LH-G4 (Figure 4B), since LH-G4 with a 12 bp stem could accommodate more SG dye. The same rule applied when the fluorescence from G4s that contain the same loop length with different hairpin forming tendency was compared (Figure 4B). The fluorescence order was as follows: LH-G4 > LI-1-G4 > LI-2-G4 > LU-G4 and MH-G4 > MI-1-G4 > MI-2-G4 > MU-G4 (Figure 4B). However, among G4s with shorter loops (S-G4), a significant difference in fluorescence was not observed among SH-G4, SU-G4, SI-2-G4, and SI-1-G4 (Figure 4B), which can be explained by the low binding affinity of SG to the short loop, since at least 3–4 bp is necessary for SG binding to dsDNA (35). Overall, our data indicates that SG is as an effective probe for hairpin-G4s, although this cannot contribute to identifying the nature of G4s except confirm the presence of hairpin.
Figure 4.
SYBR Green is an effective probe to identify the presence of hairpin in putative hairpin-G4s. (A) Schematic representation of the SYBR Green fluorescence assay. SYBR Green binds to the dsDNA stem region of the hairpin in G4s, which results in fluorescence enhancement. However, in G4s without hairpins, the fluorescence enhancement is not observed due to the absence of SYBR Green binding to the DNA. (B) SYBR Green fluorescence spectra for model hairpin-G4s (H/I-G4s) and unstructured-loop mutant (U-G4) and conventional G4 (pu22myc) shows highest fluorescence enhancement for hairpin-G4s, confirming hairpin formation.
Evaluation of hairpin-G4 structures by native gel electrophoresis
To obtain structural information of hairpin-G4s, we evaluated the migration of the model hairpin-G4s using electrophoretic mobility shift assays (EMSA) by assuming that the compact DNA structures migrate faster than unstructured or linear DNA (40). We assumed that the mobility of H-G4s in EMSA gels would be faster than those of U-G4s and I-G4s because the loops in U-G4s and I-G4s would be more relaxed and less compact than the loops in H-G4s. As expected, the migration order was H-G4 >I-G4 >U-G4, regardless of the loop length (Figure 5). In addition, most model hairpin-G4s formed the intramolecular structure (Figure 5A), although a small population of intermolecular G4s with lower mobility than intramolecular G4s was observed. The difference in mobility indicated that the dynamic nature of the unstructured loop significantly affects the overall structure of hairpin-G4s.
Figure 5.

Both model hairpin-G4s and genomic hairpin-G4 candidates form predominantly intramolecular G4s. Model hairpin-G4s (H/I-G4s) and unstructured-loop mutant (U-G4) (A) and hairpin G4s from the genome (B) and their unstructured-loop mutants were electrophoresed on a 15% Native PAGE gel stained with SYBR Gold. Samples were prepared in 10 mM HEPES (pH 7.5) and 100 mM KCl with 1 mM MgCl2 and electrophoresed in 1× TBE.
Evaluation of hairpin-G4 formation and stability by 1H NMR
From the CD analysis, we found that H-G4s have higher stability than U-G4s, and thus we further examined the effect of the hairpin on the stabilization of the G4 structure by investigating H-G4s and U-G4s using 1D 1H NMR. In the NMR analysis of H-G4, peaks in both the 10–12.5 ppm region, representing imino protons of guanines in Hoogsteen base pairing (41–43), and the 12.5–15 ppm region, corresponding to imino protons arising from the Watson-Crick base pairing of duplex DNA in the hairpin stem region (44), were observed (Figure 6), confirming the formation of hairpins in the loop region of the long-loop G4s. Interestingly, there were 2, 5, and 12 peaks in the 12.5–14.0 ppm region of the spectra of SH-G4, MH-G4 and LH-G4, respectively, which represents the number of base pairs in the stem region of the hairpin (Figure 6). This also showed that the number of base pairs in the stem region predicted by the UNAFold program is well supported by the experimental evidence. Consistently, analysis of U-G4 NMR spectra revealed no NMR peaks in the low field region (12.5–14.0 ppm), suggesting the absence of hairpin formation in U-G4s (Figure 6). However, regardless of the length and structure of the loop region of the model G4s, all long-loop G4s showed 12 peaks between 10 and 12.5 ppm, confirming the formation of G4 structure with Hoogsteen base pairs (Figure 6).
Figure 6.
Hairpin-G4s show characteristics of hairpin and G4 formation and are more stable than their unstructured loop mutants. In 1H NMR spectra analysis, hairpin-G4s (SH-G4, MH-G4, and LH-G4) show peaks for hairpin formation between 12.5–15 ppm, whereas G4s with unstructured loops (SU-G4, MU-G4 and LU-G4) do not show peaks for hairpin formation. Both H-G4s and U-G4s display peaks for G4 formation between 10–12.5 ppm. Temperature-dependent NMR spectra of H-G4s and U-G4s shows that H-G4s are more stable than U-G4s because the peaks for G4 start to disappear at higher temperatures. All spectra were recorded in 10 mM HEPES pH 7.5, 100 mM KCl, 1 mM MgCl2 at 25°C.
Once we confirmed the formation of the hairpin structure in the long-loop G4 using NMR, we further evaluated the impact of hairpin formation on the stability of the G4 structure by examining the temperature-dependent conformational change of long-loop G4s. For this purpose, we monitored the NMR peaks of SH-G4, SU-G4, MH-G4, MU-G4, LH-G4 and LU-G4 in the temperature between 20 and 80ºC (Figure 6), which enables us to understand the unfolding mechanism of H-G4s in comparison with U-G4s. In the case of LH-G4, the peaks for the Watson-Crick base pairing between 12.5 to 14.0 ppm decreased first as the temperature increased (Figure 6), followed by the decrease in imino proton peaks of G4 (10–12.5 ppm). Therefore, LH-G4 peaks disappeared completely at 80°C. However, in the case of LU-G4, a gradual disappearance of the imino proton peaks of G4 was observed even at temperatures lower than 70°C and all peaks disappeared completely at 80°C. This disappearance of G4-associated imino proton peaks at higher versus lower temperatures was also seen in the middle (MH-G4 versus MU-G4) and short loop (SH-G4 versus SU-G4) G4s (Figure 6). Consequently, the initial disappearance of the 12.5–14 ppm Watson-Crick base pair peaks followed by disappearance of 10–12.5 ppm Hoogsteen base pair peaks in H-G4s indicates that the hairpin structure contributes to stabilizing the G4 structure.
Mining of hairpin-G4s motifs from human gene promoters
Although previous studies proposed that the loop length of 10nt is most stable for G4 formation, our studies on the long-loop G4s, especially LH-G4s, showed that (i) G4s with a loop longer than 10 nt can still be stabilized when the loop forms a hairpin with perfect base pairing in the stem region, (ii) the position of the hairpin-loop does not affect the stability of the hairpin-G4 and (iii) the stem-flanking region has no effect on hairpin-G4 stability. These observations provide biophysical information required to predict the long-loop G4s with high stability in the genome and led us to question whether hairpin-G4s, like conventional G4s, present in the promoter can affect gene expression (14). We mined promoter regions within –500 to +100 bp from the transcription start site (TSS) in the human genome to identify 25 707 G4s with a loop lengths between 1–40nt by applying the schema provided in the methods (Step 1, Supplementary Table S3). After overlapping with previous experimentally verified G4s from G4Seq (14), we selected 17 357 G4s (Step 2, Supplementary File 1). We further shortlisted candidates between 13–33 nt while restricting the other two loops to a single T to match the criteria used for designing model G4s (Step3), and obtained 393 experimentally-verified candidates from this prediction (Supplementary File 2). We calculated ΔGh values of the long-loop G4s in step 3 for estimating the hairpin forming propensity (Step 4) using the UNAFold server which has been shown to robustly predict structural parameters such as GC%, base pairing, length of stem and loop region, and thermodynamic parameters such as Tm and ΔGh (18,21,22).
Study of the functional effects of hairpin-G4s in cellulo
Among the predicted hairpin-G4s, we selected the representative candidate G4s present in the promoter regions of MCM4, MSI1, TMCC3, NRBP1 and CHST1 with hairpin-forming loop lengths of 21, 27, 23, 27 and 28 nt, respectively, as candidates for further functional study, considering the biological importance of these genes and their hairpin-forming propensity (Step 5, Materials and Methods). For instance, MCM4 is implicated in oesophageal carcinoma (45), in, MSI1 and NRBP1 are implicated in colorectal cancer (46), and TMCC3 is implicated in breast cancer (47), whereas CHST1 is reported to be associated with spondyloepiphyseal dysplasia (48). Additionally, the hairpin-G4s from these promoters are expected to form stable hairpins based on the predicted structure and ΔGh values (–7.29, –4.96, –3.98, –5.33 and –6.72 kcal/mol, respectively) (Supplementary Figure S8A).
To investigate the biophysical properties, we analysed their CD spectra and thermal melting and cooling characteristics. The CD spectra of the candidate G4s revealed that they form parallel G4s as evidenced by the peak at 260 nm and trough at 240 nm (Supplementary Figure S8B). We mutated the hairpin-forming loop region in the candidate hairpin-G4s to disrupt the hairpin and generated unstructured-loop-G4s with the same loop length as the hairpin-G4s (Table 1). The CD spectra of the unstructured-loop mutants revealed that G4 formation was still observed (Supplementary Figure S8B). Thermal melting analysis revealed that the Tm of the hairpin-G4 candidates was higher than their corresponding unstructured-loop-G4s except in the case of TMCC3 (Supplementary Figure S8C and Table S1). The ΔTm between the hairpin-G4 candidates and their unstructured-loop mutants at 1 mM Mg2+ were 13.88, 14.20, 7.37 and 5.99°C for CHST1, MCM4, MSI1 and NRBP1, respectively, indicating that the hairpin-G4 candidates were much more stable than their mutants (Supplementary Figure S8C and Table S1). In the case of TMCC3-G4 and its mutant G4, both showed exceptionally high thermal stability and therefore the Tm could not be accurately determined (Supplementary Figures S8 and S9). Analysis of the melting and cooling curves of the candidate G4s (Supplementary Figure S10) indicated that the curves were almost reversible, signifying stable structure formation.
Native PAGE analysis of the candidate hairpin-G4s (Figure 5B) revealed that the G4s formed by the sequences were predominantly intramolecular as evidenced by their faster mobility. However, the presence of intermolecular G4s are also considered especially in the case of CHST1- and TMCC3-G4s by judging their slower mobility. Consistently, the presence of intermolecular G4 populations been previously observed on native PAGE in many studies on functionally important G4s such as CMYC-G4 (39), VEGF- G4 (49) and HOX11-G4 (50). Interestingly, it is found that the levels of intra- and intermolecular G4s varies depending on the hairpin formation (Figure 5). This confirmed the advantage of using native gel for identification of molecularity which cannot be detected by the CD spectra analysis. However, it is noteworthy that only the intramolecular structure is biologically relevant since the intermolecular G4s cannot be formed within promoter regions in cellulo and thus it does not affect reporter activity.
We also performed the SYBR Green fluorescence assay to probe for hairpin formation and found that the hairpin-G4s displayed higher fluorescence than the mutants (Supplementary Figure S11), confirming hairpin formation in the candidate hairpin-G4s; this was also confirmed by 1H NMR spectra. While both candidate hairpin-G4s and their mutants displayed the characteristic 10–12.5 ppm peaks representing G4 structures, only the hairpin-G4s showed peaks in the 12.5–15 ppm region, indicating hairpin formation only in the hairpin-G4s (Supplementary Figure S12).
The functional impact of hairpin-G4s present in the genome was evaluated using luciferase reporter assays in HEK293T cells. To measure the cis-regulatory effect of the hairpin-G4, the candidate hairpin-G4s and their mutants were inserted in the region upstream of the SV40 promoter of a luciferase reporter vector, and luciferase activity was measured (Figure 7A–C). We found that MCM4 (P ≤ 0.05), TMCC3(P ≤ 0.001) and CHST1(P ≤ 0.05)-G4s reduced reporter expression whereas MSI1- and NRBP1-G4s did not show significant increase in reporter expression compared with their respective mutants (Figure 7C). The only difference between the candidates and their mutants is the mutation of the hairpin-forming loop that disrupts hairpin formation. Therefore, the difference between the hairpin-G4s and the mutant-G4s in reporter activity can be attributed to the presence of a hairpin structure.
Figure 7.

The effect of hairpin formation in the candidate G4s present in the human gene promoters on the promoter activity. (A) Schema showing reporter construction for analyzing cis-regulatory effect of hairpin-G4s on luciferase reporter activity. (B) Cis-regulatory effect of the model hairpin-G4 and their mutants with the unstructured loop on the luciferase reporter activity (C) Cis-regulatory effect of the candidate-G4 containing hairpin and their mutants with the unstructured loop on the luciferase reporter activity. G4 sequences are inserted in the promoter region of the luciferase reporter. (D) The reporter was constructed in the same way in Figure 7A except inserting the G4 sequences into the 5′UTR region. (E) The model G4s (LU-G4 and LH-G4) have been tested as controls in both sense and antisense strand. (F) Reporter activity was also analyzed for candidate-G4 containing hairpin and their mutants in the 5′UTR. All data was normalized to the respective control G4 containing unstructured loop. In all experiments, luciferase expression was quantified 24 hours after transfection. Error bars represent standard deviations of three individual trials (N = 3). Significance was calculated using one-sample t-test (*P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001).
Previous studies showed that G4 formation in the 5′UTR region on the antisense strand can affect transcription as a result of RNA polymerase II stalling and following transcription attenuation (19). To investigate whether a similar effect exists in the case of hairpin-G4s, we inserted the candidate hairpin-G4 and mutant-G4 sequences in the 5′UTR on the antisense strand of the reporter plasmid (Figure 7D). Reporter activity was significantly increased in CHST1(P ≤ 0.01), MCM4(P ≤ 0.05), and NRBP1(P ≤ 0.01)-G4s, compared to the activities in their respective mutants, while TMCC3-G4 and MSI1-G4 did not affect reporter expression significantly (Figure 7F). This suggests that hairpin-formation in the loop can have different effects on gene expression depending on its propensity to stabilize the hairpin-G4.
To explain this different effect on reporter expression, we performed a reporter assay using the representative model G4s (LH-G4 and LU-G4) in the sense and antisense strands of the 5′UTR and promoter regions (Figure 7). Our reporter assay was based on a previous study where the G4 was cloned at different positions (5′UTR/ Promoter) in the sense or antisense strand, and the reporter expression was analyzed (19). We had cloned the G4s in the antisense strand because the candidate-G4s we chose from the genome were present on the antisense strand. We also checked the impact of hairpin-G4 formation on the sense strand by cloning the model G4s LH-G4 and LU-G4 in the sense strand in both promoter as well as 5′UTR positions (Figure 7B and 7E). The results suggest the hairpin-G4 (LH-G4) showed the higher expression activity compared to the G4 with unstructured loop (LU-G4) when they present in the promoter (Figure 7B), while LH-G4 showed the reduced activity when present in the 5′-UTR regardless of their presence in sense or antisense strand (Figure 7E). Our data is similar to those observed in previous studies which showed an increase in reporter expression when the G4 was present in the promoter (25). In the case of the 5′UTR, our data matches the results obtained by the previous report which features a similar reporter plasmid and experiment (19). According to the reference study, it is plausible that the reporter expression may decrease due to the formation of G4 in the 5′UTR of the transcript (19). Overall, current results using the model G4s are opposite to the effect of candidates G4s. Therefore, it is expected that an additional variable might affect the reporter activity.
We observed that the effect of the model hairpin-G4s on reporter expression is different from that of the candidate G4s present in the genome, especially when it was inserted in the 5′UTR. Therefore, we considered the possibility of formation of multiple or alternative quadruplex structures by the guanine tracts as reported in the previous studies (39,51–53). To prove this hypothesis, we have designed MCM4-G4NGT (where NGT represents No Guanine Tract) in a manner that the GC% of the long loop is retained while the sequence in the stem region of the hairpin in MCM4-G4 was changed to remove the GGG tract (Supplementary Figure S13A). We confirmed that MCM4-G4NGT could also form stable parallel G4 structure similar to MCM4-G4 (Supplementary Figure S13B and S13C) and contain the hairpin structure (Supplementary Figure S13D). Then we compared the effect of MCM4-G4NGT on the reporter activity with that of MCM4-MutG4 (Supplementary Figure S13E). While the gene expression increased when MCM4-G4 was inserted in the 5′UTR in comparison to the MCM4-MutG4 (Figure 7C), reporter with MCM4-G4NGT showed the reduced reporter activity compared with the MCM4-MutG4 (Supplementary Figure S13E), similar to the trend found in the model sequence LH-G4. Since MCM4-G4NGT and LH-G4 do not contain a GGG tract in the long loop, the chance of forming mutually exclusive G4 formation is less. This clearly indicates that the presence of a GGG tract in the long-loop leads to the formation of mutually exclusive G4s, which result in a different impact on expression as proposed previously (39,52). In addition, we also observed a decrease in the expression when MCM4-G4NGT cloned in the promoter region compared to MCM4-MutG4 (Supplementary Figure S13E), which is similar to the decrease in expression observed in the case of MCM4-G4 cloned in the promoter region, although the magnitude of repression was higher in MCM4-G4NGT suggesting that the potential formation of mutually exclusive G4 or the presence of multiple conformations could influence the impact of H-G4 formation on expression. Further studies are needed to explore their impact on the structure, stability, and functional effects.
Impact of hairpin-G4 formation on transcription fidelity in vitro
Considering the different impact of candidate hairpin-G4s from the genome observed in reporter assays, we also wanted to confirm their ability to suppress transcription in vitro. In this regard, the RNA polymerase stop assay has been previously employed to ascertain the impact of G4 formation and stability on transcription fidelity (20). Briefly, if the stable G4 is formed, the transcript can either be slipped (longer transcripts) or arrested (shorter transcript). Additionally, transcriptional pausing can happen leading to the formation of lower amounts of the full-length transcripts. Significantly stable G4s produce higher amount of arrested transcripts at 35nt, which is the distance from the polymerase binding site to the G4-forming region. However, if the G4 is not stable, polymerase stalling does not happen and higher amounts of the full-length transcript are obtained. We performed this assay using the candidate hairpin-G4s and the model hairpin-G4s (LH/LU/LI-G4s). Our results on the model hairpin-G4s (Figure 8A) indicated the distinct reduction of full-length transcript and increase in the level of 35nt arrested transcript in LH-G4 compared with LU-G4 and LI-G4 (LH-G4 > LI-1-G4 > LI-2-G4 > LU-G4), correlating well with the expected stabilities of the designed hairpin-G4s. We also observed a similar trend in the case of the genomic candidates (Figure 8B), indicating the ability of the hairpin-G4s to perturb transcription by influencing the progression of RNA polymerase.
Figure 8.
Hairpin-G4s can act as transcriptional barriers. T7 polymerase stop assay using model G4s (A), and candidate G4s (wild type and mutant containing unstructured loop) (B). Large amounts of arrested (A) and slipped transcript (S), and less amount of full length transcript (FL) are observed in G4s with stable hairpin compared to the counterparts containing the unstructured loop. ‘M’ represents the size marker.
DISCUSSION
The potential roles of promoter G4s in gene regulation in cancer has been well established from previous studies such as those on C-MYC, BCL2, KRAS and C-KIT promoters (54). Furthermore, much experimental and informatics evidence has implicated the role of G4 formation in many cellular processes and disease pathogenesis (55,56). Therefore, identification of G4s in the genome and functional annotation of their roles are necessary for advanced studies of the genome. Although the optimal loop length was initially predicted to be 3–7 nt, later studies determined that longer loops can also form G4s (57). However, the stability of the G4 is inversely proportional to the length of the loop (58). Further structural and functional studies on G4s revealed that G4 formation is not simply determined by the presence of G-tract and the loop length but is also affected by many parameters such as loop length, loop composition, and the loop structure (3,4,58,59). We hypothesized that long-loop G4s with a loop longer than 20 nt can still form stable G4s depending on the stability of the loop. We further speculated that the presence of the hairpin structure in the long loop can make the loop less flexible, thus increasing the stability of the G4 structure. Indeed, analysis of Tm from CD experiments on G4s with a hairpin loop (H-G4s) and G4s containing unstructured long loops (U-G4s) indicated that H-G4s were more stable than U-G4s in all instances. However, we also noticed that the H-G4 structure is not as stable as the canonical G4 with short loops, which may be due to the effect of the loop length as well as the effect of sudden hairpin unwinding that disrupts the G4 structure, as evidenced by NMR melting.
It is well known that the stability of the G4 decreases when the loop length increases. This is primarily due to the increased flexibility of longer loops. Corroborating this, when we compared the CD Tm of the model hairpin-G4s (Table 3), we found that the Tm of U-G4s decreased as the loop length increased (SU-G4 > MU-G4 > LU-G4). However, when we compared the CD Tm of hairpin-G4s (Table 3), we found that there was no significant impact as the loop length increased, indicating that the formation of the hairpin structure in the loop reduced the impact of loop flexibility on the G4 structure (SH-G4∼MH-G4∼LH-G4). A similar result was also seen in the case of intermediate G4s (I-G4s), indicating that less stable stem-loops could also reduce the impact of loop length on G4 structure.
There are a number of factors that govern hairpin stability, such as salt concentration, loop length, loop composition, closing base pair (CBP), and presence of mismatches in the stem region. Higher salt concentration (60), short pyrimidine-rich loops (61–64), C:G/G:C closing base pairs (65), and absence of mismatches in the stem region are some conditions that favour hairpin formation. However, most studies have been performed on hairpins with a homogenous loop sequence such as TTTT, GGGG, CCCC and so on (66–69). Heterogeneous loop regions have not been systematically studied, so it is only possible to speculate on the influence of these parameters on the stability of hairpins with heterogeneous base composition in their loop regions.
The importance of loop distribution in G4s has not been well studied; the role of loop position was only recently analysed with loops of 3 nt length composed of thymines (59). Although the position of the loop was found to have a significant impact on the topology of the G4, it was difficult to establish direct correlations between the loop position and topology of the G4. In the condition in which two loops are of equal length and the third loop is longer, the study observed a maximum difference of ∼4°C in the average Tm. Our study also did not find any significant difference in the Tm of H-G4s when the position of the hairpin-forming loop was changed (Figure 3A), which expands the scope for prediction of hairpin-G4s in the human genome by abolishing any bias with respect to loop position and facilitates the discovery of new gene targets for G4-mediated regulation.
The CD and NMR experiments both showed that the sequences can form parallel G4 structure. Indeed, the imino proton peaks between the unstructured loop G4 (U-G4) and hairpin-G4 (H-G4) were shifted, indicating that the structure of G4 in minutiae are changed, but the overall structure of the G4s in both cases seem to remain parallel confirmed by CD spectra at various temperature ranges (20–80 degree, Supplementary Figure S5). In addition, since NMR spectra of H-G4s did not show significant chemical shift changes after disappearance of hairpin peaks, hairpin does not seem to affect the large conformational change of G4. Therefore, it is expected that binding of hairpin to the side of G4 might contribute to the G4 stability. However, in this case, binding must be not strong enough to induce the conformational change of G4. Alternatively, this can be explained by that destabilizing effect of long loop is less in the case of hairpin-G4 compared to the unstructured-loop G4 due to the limited conformational flexibility of hairpin. Further study with high resolution structure will answer this question.
Non-canonical long-looped G4s are of immense importance considering their prevalence in the genome, which was previously demonstrated in G4-seq (14). The G4-forming sequences were shown to be enriched in regulatory regions such as promoters. From current hairpin-G4 mining studies in the human genome, we found that the gene promoters including important oncogenes and tumour suppressor genes such as NOTCH1, RET and ELL contain many potential hairpin-G4 forming sequences. We applied the parameters for hairpin-G4 stability prediction to the putative hairpin-G4 sequences obtained from the genome and chose to work on CHST1, MCM4, MSI1, NRBP1, and TMCC3 genes for our study because of their higher hairpin-forming propensity and biological significance in cancer and other diseases.
While reporter assays with the candidates TMCC3, CHST1 and MCM4 showed statistically significant changes (P < 0.05) in activities compared with their unstructured loop controls (Figure 7A), we could not find a direct correlation between the thermal stability of the candidate hairpin-G4s with the magnitude of expression change in reporter assays because the cis-regulatory effect might be mediated by the binding of various transcription factors, as evidenced by previous reports (70,71). The generalization of the effect of promoter G4 formation on gene expression and the impact on downstream processes is further complicated by the association of expression levels with diseases. For example, high levels of CHST1 and NRBP1 are associated with inflammation and cancer progression (72,73). Interestingly, NRBP1 was implicated as an oncogene in prostate cancer and a tumour suppressor in lung, breast, and colorectal cancers. In our analysis, the presence of the hairpin-G4 in the promoter of CHST1 reduced promoter activity in the reporter system, whereas the hairpin-G4 in the NRBP1 promoter increased reporter expression (Figure 7A). Concomitantly, MCM4, MSI1 and TMCC3 are oncogenes implicated in adenocarcinoma, cervical cancer, glioblastomas, leukaemia, and breast cancers (45,47,74). Therefore, low expression of these genes is greatly preferred. Our studies show that the promoter hairpin-G4s from MCM4 and TMCC3 genes downregulated reporter activity, but the hairpin-G4 in the MSI1 promoter upregulated reporter activity (Figure 7A). Extrapolation of our reporter assay results implicates hairpin-G4s in both gene upregulation as well as downregulation. This indicates that in the genomic context, the activity of hairpin-G4s may be influenced by factors that can stabilize or destabilize the hairpin-G4s, such as supercoiling and transcription factor binding. Consistent with these results, genome-wide transcription studies of the human cytomegalovirus also revealed context-dependent roles of G4 in gene regulation (70). Individual promoter-level studies may facilitate further elucidation of the exact roles of hairpin-G4s in gene regulation.
We also evaluated the impact of hairpin-G4 formation on polymerase stalling (Figure 8) and transcription/translation (Figure 7). Although the model hairpin-G4s and candidate G4s always showed higher polymerase stalling activity than the corresponding G4s with the unstructured long loop (Figure 8), in the case of our reporter assay the candidate G4s showed upregulation when cloned in the 5′UTR. It is expected that the expression would decrease when the G4 was formed in the antisense because it can block polymerase progression (19). Closer examination of the sequence and secondary structure of the long loops in the candidate G4s revealed that they contained a GGG tract. It has been previously reported that the presence of multiple GGG tracts can lead to the formation of mutually exclusive G4s, possibly showing multiple conformations (39,51–53).
We hypothesized that the increase in expression in candidate G4s could be a result of mutually exclusive G4 formation, and we tested this hypothesis by performing the reporter assay by cloning the model LH-G4 and a mutated candidate G4 MCM-G4NGT, both without any GGG-tracts in the long loop, in the 5′UTR. Interestingly, both LH-G4 and MCM-G4NGT showed a significant decrease in reporter expression, unlike the candidate G4s with the GGG tract that showed increase in expression. The decrease in expression is similar to the previous study where they showed that stable G4s in the 5′UTR can stall polymerase progression (19). We also tested the strand-specific activity by cloning LH-G4 in the sense strand, and found that it can also decrease transcription, possibly due to the formation of hairpin-G4 in the 5′UTR region of the RNA as observed in a previous study (19).
Our strategy for selection of candidates considered the biological significance of the gene as well as the propensity for hairpin formation in the loop region. When we predict the propensity using the ΔGh value, it is evident that candidates with higher G–C bonding in the stem region will show the highest propensity for hairpin formation. Therefore, possibility of the formation of mutually exclusive G4s must be evaluated as a factor influencing the impact of hairpin-G4 formation on gene expression while working on hairpin-G4s from the human genome in future studies.
The hairpin itself is also a critical DNA secondary structure that is shown to exist in DNA as cruciform DNA (16,75). Hairpins are shown to play critical roles in transcription regulation by modifying the chromatin and also by interacting with specific proteins that can bind to the hairpin structure (75). These proteins include key transcription factors such as p53 (76) and p73 (77), architectural proteins such as DEK (78), and tumour suppressors such as BRCA1 and PARP-1 (16). Considering the structure-specific recognition of these proteins, it is plausible that these factors can recognize the hairpin region of the hairpin-G4 and contribute to additional regulation, i.e., by preventing the binding of G4-specific transcription factors or proteins, by providing additional stability to the hairpin-G4 structure, or by directly impacting the local DNA topology. Therefore, there are multifarious ways in which both the hairpin as well as G4 regions of the hairpin-G4 can impact gene regulation.
CONCLUSION
We have demonstrated that stable hairpin-G4 structures can be formed even in sequences with long loops depending on the structural property of the loop. We also proved that the formation of hairpins results in increased stability of the hairpin-G4 irrespective of the position of the loop or the length of the stem-flanking region. Based on these observations, we identified five G4s that contain hairpin forming long loops. We further demonstrated that hairpin-G4s have a regulatory role in gene expression by examining the roles of hairpin-G4s and their mutants with unstructured loops in reporter activity in cellulo. However, their effects must be considered in combination with other cis- and trans-factors. This study expands the current horizons of G4 structure and function and may be invaluable for future studies on the prediction and functional roles of G4s.
DATA AVAILABILITY
Derived data supporting the findings of this study are available from the corresponding author (K.K.) on request.
Supplementary Material
ACKNOWLEDGEMENTS
Author contributions: K.K.K. and S.R. designed the study, S.R. and M.R. performed the experiments, N.P., A.G. and S.R. performed the bioinformatics analysis and G4 prediction. K.K.K. and S.R. wrote the manuscript.
Contributor Information
Subramaniyam Ravichandran, Department of Precision Medicine, Graduate School of Basic Medical Science (GSBMS), Institute for Antimicrobial Resistance Research and Therapeutics, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
Maria Razzaq, Department of Precision Medicine, Graduate School of Basic Medical Science (GSBMS), Institute for Antimicrobial Resistance Research and Therapeutics, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
Nazia Parveen, Department of Precision Medicine, Graduate School of Basic Medical Science (GSBMS), Institute for Antimicrobial Resistance Research and Therapeutics, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
Ambarnil Ghosh, Department of Precision Medicine, Graduate School of Basic Medical Science (GSBMS), Institute for Antimicrobial Resistance Research and Therapeutics, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
Kyeong Kyu Kim, Department of Precision Medicine, Graduate School of Basic Medical Science (GSBMS), Institute for Antimicrobial Resistance Research and Therapeutics, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.
FUNDING
This work was supported by Samsung Science & Technology Foundation (SSTF-BA1301-01) and National Research Foundation of Korea funded by the Ministry of Science and ICT [2020R1A4A1018019, 2021R1A2C3011644] to K.K., and the Ministry of Education [2019R1I1A1A01060394] to S.R. Funding for open access charge: National Research Foundation of Korea.
Conflict of interest statement. None declared.
REFERENCES
- 1. Cang X.H., Sponer J., Cheatham T.E.. Insight into G-DNA structural polymorphism and folding from sequence and loop connectivity through free energy analysis. J. Am. Chem. Soc. 2011; 133:14270–14279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Karsisiotis A.I., Hessari N.M., Novellino E., Spada G.P., Randazzo A., Webba da Silva M.. Topological characterization of nucleic acid G-quadruplexes by UV absorption and circular dichroism. Angew. Chem. Int. Ed. Engl. 2011; 50:10645–10648. [DOI] [PubMed] [Google Scholar]
- 3. Tippana R., Xiao W., Myong S.. G-quadruplex conformation and dynamics are determined by loop length and sequence. Nucleic Acids Res. 2014; 42:8106–8114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Hazel P., Huppert J., Balasubramanian S., Neidle S.. Loop-length-dependent folding of G-quadruplexes. J. Am. Chem. Soc. 2004; 126:16405–16415. [DOI] [PubMed] [Google Scholar]
- 5. Webba da Silva M. Geometric formalism for DNA quadruplex folding. Chemistry. 2007; 13:9738–9745. [DOI] [PubMed] [Google Scholar]
- 6. Guedin A., De Cian A., Gros J., Lacroix L., Mergny J.L.. Sequence effects in single-base loops for quadruplexes. Biochimie. 2008; 90:686–696. [DOI] [PubMed] [Google Scholar]
- 7. Bedrat A., Lacroix L., Mergny J.L.. Re-evaluation of G-quadruplex propensity with G4Hunter. Nucleic Acids Res. 2016; 44:1746–1759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Kikin O., D’Antonio L., Bagga P.S.. QGRS mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences. Nucleic Acids Res. 2006; 34:W676–W682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Huppert J.L., Balasubramanian S.. Prevalence of quadruplexes in the human genome. Nucleic Acids Res. 2005; 33:2908–2916. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Agrawal P., Lin C., Mathad R.I., Carver M., Yang D.. The major G-quadruplex formed in the human BCL-2 proximal promoter adopts a parallel structure with a 13-nt loop in K+ solution. J. Am. Chem. Soc. 2014; 136:1750–1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Jodoin R., Bauer L., Garant J.M., Mahdi Laaref A., Phaneuf F., Perreault J.P.. The folding of 5′-UTR human G-quadruplexes possessing a long central loop. RNA. 2014; 20:1129–1141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Amrane S., Adrian M., Heddi B., Serero A., Nicolas A., Mergny J.L., Phan A.T.. Formation of pearl-necklace monomorphic G-quadruplexes in the human CEB25 minisatellite. J. Am. Chem. Soc. 2012; 134:5807–5816. [DOI] [PubMed] [Google Scholar]
- 13. Lim K.W., Khong Z.J., Phan A.T.. Thermal stability of DNA quadruplex-duplex hybrids. Biochemistry. 2014; 53:247–257. [DOI] [PubMed] [Google Scholar]
- 14. Chambers V.S., Marsico G., Boutell J.M., Di Antonio M., Smith G.P., Balasubramanian S.. High-throughput sequencing of DNA G-quadruplex structures in the human genome. Nat. Biotechnol. 2015; 33:877–881. [DOI] [PubMed] [Google Scholar]
- 15. Lim K.W., Jenjaroenpun P., Low Z.J., Khong Z.J., Ng Y.S., Kuznetsov V.A., Phan A.T.. Duplex stem-loop-containing quadruplex motifs in the human genome: a combined genomic and structural study. Nucleic Acids Res. 2015; 43:5630–5646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Brazda V., Laister R.C., Jagelska E.B., Arrowsmith C.. Cruciform structures are a common DNA feature important for regulating biological processes. BMC Mol. Biol. 2011; 12:33. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Wong H.M., Stegle O., Rodgers S., Huppert J.L.. A toolbox for predicting g-quadruplex formation and stability. J. Nucleic Acids. 2010; 2010:564946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Jeddi I., Saiz L.. Three-dimensional modeling of single stranded DNA hairpins for aptamer-based biosensors. Sci. Rep. 2017; 7:1178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Agarwal T., Roy S., Kumar S., Chakraborty T.K., Maiti S.. In the sense of transcription regulation by G-quadruplexes: asymmetric effects in sense and antisense strands. Biochemistry. 2014; 53:3711–3718. [DOI] [PubMed] [Google Scholar]
- 20. Tateishi-Karimata H., Isono N., Sugimoto N.. New insights into transcription fidelity: thermal stability of non-canonical structures in template DNA regulates transcriptional arrest, pause, and slippage. PLoS One. 2014; 9:e90580. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Dong F., Allawi H.T., Anderson T., Neri B.P., Lyamichev V.I.. Secondary structure prediction and structure-specific sequence analysis of single-stranded DNA. Nucleic Acids Res. 2001; 29:3248–3257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Tulpan D., Andronescu M., Leger S.. Free energy estimation of short DNA duplex hybridizations. BMC Bioinformatics. 2010; 11:105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Kypr J., Kejnovska I., Renciuk D., Vorlickova M.. Circular dichroism and conformational polymorphism of DNA. Nucleic Acids Res. 2009; 37:1713–1725. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Randazzo A., Spada G.P., da Silva M.W.. Chaires J.B., Graves D.. Quadruplex Nucleic Acids. 2013; Berlin, Heidelberg: Springer Berlin Heidelberg; 67–86. [Google Scholar]
- 25. Biswas B., Kandpal M., Vivekanandan P.. A G-quadruplex motif in an envelope gene promoter regulates transcription and virion secretion in HBV genotype B. Nucleic Acids Res. 2017; 45:11268–11280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Rachwal P.A., Fox K.R.. Quadruplex melting. Methods. 2007; 43:291–301. [DOI] [PubMed] [Google Scholar]
- 27. Owczarzy R., Moreira B.G., You Y., Behlke M.A., Walder J.A.. Predicting stability of DNA duplexes in solutions containing magnesium and monovalent cations. Biochemistry. 2008; 47:5336–5353. [DOI] [PubMed] [Google Scholar]
- 28. Bao L., Zhang X., Jin L., Tan Z.-J.. Flexibility of nucleic acids: from DNA to RNA. Chin. Phys. B. 2016; 25:018703. [Google Scholar]
- 29. Zhu H., Xiao S., Liang H.. Structural dynamics of human telomeric G-quadruplex loops studied by molecular dynamics simulations. PLoS One. 2013; 8:e71380. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Rachwal P.A., Brown T., Fox K.R.. Sequence effects of single base loops in intramolecular quadruplex DNA. FEBS Lett. 2007; 581:1657–1660. [DOI] [PubMed] [Google Scholar]
- 31. Yan Y.Y., Lin J., Ou T.M., Tan J.H., Li D., Gu L.Q., Huang Z.S.. Selective recognition of oncogene promoter G-quadruplexes by Mg2+. Biochem. Biophys. Res. Commun. 2010; 402:614–618. [DOI] [PubMed] [Google Scholar]
- 32. Kreig A., Calvert J., Sanoica J., Cullum E., Tipanna R., Myong S.. G-quadruplex formation in double strand DNA probed by NMM and CV fluorescence. Nucleic Acids Res. 2015; 43:7961–7970. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Nicoludis J.M., Miller S.T., Jeffrey P.D., Barrett S.P., Rablen P.R., Lawton T.J., Yatsunyk L.A.. Optimized end-stacking provides specificity of N-methyl mesoporphyrin IX for human telomeric G-quadruplex DNA. J. Am. Chem. Soc. 2012; 134:20446–20456. [DOI] [PubMed] [Google Scholar]
- 34. Zipper H., Brunner H., Bernhagen J., Vitzthum F.. Investigations on DNA intercalation and surface binding by SYBR Green I, its structure determination and methodological implications. Nucleic Acids Res. 2004; 32:e103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Dragan A.I., Pavlovic R., McGivney J.B., Casas-Finet J.R., Bishop E.S., Strouse R.J., Schenerman M.A., Geddes C.D.. SYBR Green I: fluorescence properties and interaction with DNA. J. Fluoresc. 2012; 22:1189–1199. [DOI] [PubMed] [Google Scholar]
- 36. Xu H., Gao S., Yang Q., Pan D., Wang L., Fan C.. Amplified fluorescent recognition of g-quadruplex folding with a cationic conjugated polymer and DNA intercalator. ACS Appl. Mater. Interfaces. 2010; 2:3211–3216. [DOI] [PubMed] [Google Scholar]
- 37. Zhan S., Wu Y., Luo Y., Liu L., He L., Xing H., Zhou P.. Label-free fluorescent sensor for lead ion detection based on lead(II)-stabilized G-quadruplex formation. Anal. Biochem. 2014; 462:19–25. [DOI] [PubMed] [Google Scholar]
- 38. Ambrus A., Chen D., Dai J., Jones R.A., Yang D.. Solution structure of the biologically relevant G-quadruplex element in the human c-MYC promoter. Implications for G-quadruplex stabilization. Biochemistry. 2005; 44:2048–2058. [DOI] [PubMed] [Google Scholar]
- 39. Siddiqui-Jain A., Grand C.L., Bearss D.J., Hurley L.H.. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc. Natl. Acad. Sci. U.S.A. 2002; 99:11593–11598. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Bryan T.M., Baumann P.. G-quadruplexes: from guanine gels to chemotherapeutics. Mol. Biotechnol. 2011; 49:198–208. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Dai J., Chen D., Jones R.A., Hurley L.H., Yang D.. NMR solution structure of the major G-quadruplex structure formed in the human BCL2 promoter region. Nucleic Acids Res. 2006; 34:5133–5144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Adrian M., Heddi B., Phan A.T.. NMR spectroscopy of G-quadruplexes. Methods. 2012; 57:11–24. [DOI] [PubMed] [Google Scholar]
- 43. Aboul-ela F., Murchie A.I., Lilley D.M.. NMR study of parallel-stranded tetraplex formation by the hexadeoxynucleotide d(TG4T). Nature. 1992; 360:280–282. [DOI] [PubMed] [Google Scholar]
- 44. Rinkel L.J., Tinoco I. Jr. A proton NMR study of a DNA dumb-bell structure with hairpin loops of only two nucleotides: d(CACGTGTGTGCGTGCA). Nucleic Acids Res. 1991; 19:3695–3700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Choy B., LaLonde A., Que J., Wu T., Zhou Z.. MCM4 and MCM7, potential novel proliferation markers, significantly correlated with Ki-67, Bmi1, and cyclin E expression in esophageal adenocarcinoma, squamous cell carcinoma, and precancerous lesions. Hum. Pathol. 2016; 57:126–135. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Liao Y., Yang Z., Huang J., Chen H., Xiang J., Li S., Chen C., He X., Lin F., Yang Z.et al.. Nuclear receptor binding protein 1 correlates with better prognosis and induces caspase-dependent intrinsic apoptosis through the JNK signalling pathway in colorectal cancer. Cell Death. Dis. 2018; 9:436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Kudinov A.E., Karanicolas J., Golemis E.A., Boumber Y.. Musashi RNA-binding proteins as cancer drivers and novel therapeutic targets. Clin. Cancer Res. 2017; 23:2143–2153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48. van Roij M.H., Mizumoto S., Yamada S., Morgan T., Tan-Sindhunata M.B., Meijers-Heijboer H., Verbeke J.I., Markie D., Sugahara K., Robertson S.P.. Spondyloepiphyseal dysplasia, Omani type: further definition of the phenotype. Am. J. Med. Genet. A. 2008; 146A:2376–2384. [DOI] [PubMed] [Google Scholar]
- 49. Sun D., Liu W.J., Guo K.X., Rusche J.J., Ebbinghaus S., Gokhale V., Hurley L.H.. The proximal promoter region of the human vascular endothelial growth factor gene has a G-quadruplex structure that can be targeted by G-quadruplex-interactive agents. Mol. Cancer Ther. 2008; 7:880–889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Nambiar M., Srivastava M., Gopalakrishnan V., Sankaran S.K., Raghavan S.C.. G-quadruplex structures formed at the HOX11 breakpoint region contribute to its fragility during t(10;14) translocation in T-cell leukemia. Mol. Cell. Biol. 2013; 33:4266–4281. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Payet L., Huppert J.L.. Stability and structure of long intramolecular G-quadruplexes. Biochemistry. 2012; 51:3154–3161. [DOI] [PubMed] [Google Scholar]
- 52. Tassinari M., Zuffo M., Nadai M., Pirota V., Sevilla Montalvo A.C., Doria F., Freccero M., Richter S.N.. Selective targeting of mutually exclusive DNA G-quadruplexes: HIV-1 LTR as paradigmatic model. Nucleic Acids Res. 2020; 48:4627–4642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Dickerhoff J., Onel B., Chen L., Chen Y., Yang D.. Solution structure of a MYC promoter G-Quadruplex with 1:6:1 loop length. ACS Omega. 2019; 4:2533–2539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Balasubramanian S., Hurley L.H., Neidle S.. Targeting G-quadruplexes in gene promoters: a novel anticancer strategy?. Nat. Rev. Drug Discov. 2011; 10:261–275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Wu Y., Brosh R.M. Jr. G-quadruplex nucleic acids and human disease. FEBS J. 2010; 277:3470–3488. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56. Huppert J.L. Structure, location and interactions of G-quadruplexes. FEBS J. 2010; 277:3452–3458. [DOI] [PubMed] [Google Scholar]
- 57. Guedin A., Gros J., Alberti P., Mergny J.L.. How long is too long? Effects of loop size on G-quadruplex stability. Nucleic Acids Res. 2010; 38:7858–7868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58. Risitano A., Fox K.R.. Influence of loop size on the stability of intramolecular DNA quadruplexes. Nucleic Acids Res. 2004; 32:2598–2606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59. Cheng M., Cheng Y., Hao J., Jia G., Zhou J., Mergny J.L., Li C.. Loop permutation affects the topology and stability of G-quadruplexes. Nucleic Acids Res. 2018; 46:9264–9275. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60. Tan Z.J., Chen S.J.. Salt dependence of nucleic acid hairpin stability. Biophys. J. 2008; 95:738–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61. Haasnoot C.A.G., de Bruin S.H., Hilbers C.W., van der Marel G.A., van Boom J.H.. Loopstructures in synthetic oligonucleotides. Hairpin stability and structure studied as a function of loop elongation. J. Biosci. 1985; 8:767–780. [Google Scholar]
- 62. Nguyen B., Wilson W.D.. The effects of hairpin loops on ligand-DNA interactions. J. Phys. Chem. B. 2009; 113:14329–14335. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63. Senior M.M., Jones R.A., Breslauer K.J.. Influence of loop residues on the relative stabilities of DNA hairpin structures. Proc. Natl. Acad. Sci. U.S.A. 1988; 85:6242–6246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64. Blommers M.J., Walters J.A., Haasnoot C.A., Aelen J.M., van der Marel G.A., van Boom J.H., Hilbers C.W.. Effects of base sequence on the loop folding in DNA hairpins. Biochemistry. 1989; 28:7491–7498. [DOI] [PubMed] [Google Scholar]
- 65. Wang J., Dong P., Wu W., Pan X., Liang X.. High-throughput thermal stability assessment of DNA hairpins based on high resolution melting. J. Biomol. Struct. Dyn. 2018; 36:1–13. [DOI] [PubMed] [Google Scholar]
- 66. Xodo L.E., Manzini G., Quadrifoglio F., van der Marel G., van Boom J.. DNA hairpin loops in solution. Correlation between primary structure, thermostability and reactivity with single-strand-specific nuclease from mung bean. Nucleic Acids Res. 1991; 19:1505–1511. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Kannan S., Zacharias M.. Role of the closing base pair for d(GCA) hairpin stability: free energy analysis and folding simulations. Nucleic Acids Res. 2011; 39:8271–8280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68. Kuznetsov S.V., Ren C.C., Woodson S.A., Ansari A.. Loop dependence of the stability and dynamics of nucleic acid hairpins. Nucleic Acids Res. 2008; 36:1098–1112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Shen Y.Q., Kuznetsov S.V., Ansari A.. Loop dependence of the dynamics of DNA hairpins. J. Phys. Chem. B. 2001; 105:12202–12211. [Google Scholar]
- 70. Mishra S.K., Tawani A., Mishra A., Kumar A.. G4IPDB: a database for G-quadruplex structure forming nucleic acid interacting proteins. Sci. Rep. 2016; 6:38144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Zhang C., Liu H.H., Zheng K.W., Hao Y.H., Tan Z.. DNA G-quadruplex formation in response to remote downstream transcription activity: long-range sensing and signal transducing in DNA double helix. Nucleic Acids Res. 2013; 41:7144–7152. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Liao Y., Yang Z.H., Huang J.T., Chen H., Xiang J., Li S.M., Chen C.Y., He X., Lin F., Yang Z.L.et al.. Nuclear receptor binding protein 1 correlates with better prognosis and induces caspase-dependent intrinsic apoptosis through the JNK signalling pathway in colorectal cancer. Cell Death. Dis. 2018; 9:436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Li X., Tu L., Murphy P.G., Kadono T., Steeber D.A., Tedder T.F.. CHST1 and CHST2 sulfotransferase expression by vascular endothelial cells regulates shear-resistant leukocyte rolling via L-selectin. J. Leukoc. Biol. 2001; 69:565–574. [PubMed] [Google Scholar]
- 74. Das M., Prasad S.B., Yadav S.S., Govardhan H.B., Pandey L.K., Singh S., Pradhan S., Narayan G.. Over expression of minichromosome maintenance genes is clinically correlated to cervical carcinogenesis. PLoS One. 2013; 8:e69607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Ward G.K., Shihab-el-Deen A., Zannis-Hadjopoulos M., Price G.B.. DNA cruciforms and the nuclear supporting structure. Exp. Cell Res. 1991; 195:92–98. [DOI] [PubMed] [Google Scholar]
- 76. Coufal J., Jagelska E.B., Liao J.C., Brazda V.. Preferential binding of p53 tumor suppressor to p21 promoter sites that contain inverted repeats capable of forming cruciform structure. Biochem. Biophys. Res. Commun. 2013; 441:83–88. [DOI] [PubMed] [Google Scholar]
- 77. Cechova J., Coufal J., Jagelska E.B., Fojta M., Brazda V.. p73, like its p53 homolog, shows preference for inverted repeats forming cruciforms. PLoS One. 2018; 13:e0195835. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78. Waldmann T., Baack M., Richter N., Gruss C.. Structure-specific binding of the proto-oncogene protein DEK to DNA. Nucleic Acids Res. 2003; 31:7003–7010. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Derived data supporting the findings of this study are available from the corresponding author (K.K.) on request.






