ABSTRACT
In this issue, Hustmyer and colleagues (C. M. Hustmyer, C. A. Simpson, S. G. Olney, M. L. Bochman, and J. C. van Kessel, J Bacteriol 200:e00724-17, 2018) describe a new method for rapidly generating reporter libraries. This technique, rapid arbitrary PCR insertion libraries (RAIL), uses arbitrary PCR and isothermal DNA assembly to insert random fragments of promoter regions into reporter plasmids, resulting in libraries that can be screened to identify regions required for gene expression. This technique will likely be useful for a number of different genetic applications.
KEYWORDS: promoter identification, reporter fusion, transcriptional reporter, transcriptional regulation
TEXT
Transcription is a core process in bacterial physiology. It is the first step in gene regulation and, because of the significance of this process, is heavily regulated. Appropriate regulation of transcription in bacteria is critical, as their environment can change dramatically and quickly. A promoter is a region of the chromosome that determines when and how transcription of a gene is initiated (1). In bacteria, promoter regions require specific features that make the region recognizable by the RNA polymerase (RNAP) and any required transcription factors (regulatory proteins). RNAP is not able to bind promoters and initiate transcription on its own but requires specific sigma factors that complete the functional enzyme and direct it to promoter regions that the sigma factor specifies (2). In most bacterial species, there are several sigma factors that are specific to different sets of promoters and used to express genes under different environmental conditions (3). There are general characteristics of promoter regions that are largely conserved in most bacteria. In addition to any transcription factor binding sites, the most important sequences of the promoter are the two “boxes” that are recognized by RNAP and specific sigma factors, known as the −35 box and the −10 box. These elements are located upstream of the transcriptional start site and can dictate the strength of the promoter (1). However, the primary DNA sequence is not the only information used to direct the transcription of genes. Promoter regions also require specific characteristics such as curvature, flexibility, and other physical properties that make promoters difficult to characterize by using the sequence alone (4–6).
Accurate prediction of promoter regions is a key issue for determining expression patterns of genes and for understanding genetic regulatory networks. While there has been an explosion of whole-genome sequencing in recent years, particularly in bacteria because of their relatively small genome sizes, promoter and transcriptional start site identification is less well developed. Even though bacteria generally have less complex promoter structures than those of higher organisms, their identification remains challenging. The majority of the recent effort has been applied to developing computational promoter prediction tools. Over the last 2 decades, numerous promoter prediction tools have been developed to identify bacterial promoter regions. Early studies used position weight matrices, relying on the conservation of the −35 and −10 boxes for the housekeeping sigma factor in Escherichia coli (7). Most recently, the use of convolutional neural networks has significantly increased the accuracy of promoter prediction programs (8). Despite the variety of computational tools employed and the increasing complexity of the algorithms used, all of these tools have significant limitations, such as producing a high number of false positives or showing limited sensitivity when applied to longer sequences or whole genomes. Therefore, they are often of questionable use in examining newly sequenced genomes for promoter or transcription start site identification.
Some of the issues that limit the utility of computational promoter prediction tools include the fact that some recently sequenced bacterial genomes contain promoter features that are different from those described in E. coli and other classically studied bacteria. For example, in some reduced-genome bacteria, promoters appear to have evolved to the point that the −35 box is unnecessary, as it is degraded or completely eliminated in some species (9). This could result in a situation where specific sequence characteristics that are used to develop promoter prediction algorithms would make those tools useless for promoter identification in some bacteria. Additionally, recent transcriptomic studies have revealed surprisingly complex transcriptional architecture in bacteria, including high numbers of small RNAs, promoters located within the gene coding sequence on the sense strand, and antisense transcripts (10). All of this complexity, as well as many other features not mentioned here, continues to hinder the development of high-quality promoter prediction tools. Importantly, using such in silico tools is only a first step in promoter identification. Even with the best software, predicted promoter sequences eventually need to be experimentally validated by genetic methods.
The accompanying report presents a new technique called rapid arbitrary PCR insertion libraries (RAIL) that should be very useful to many researchers who would like to further investigate difficult-to-identify promoter regions by using genetic tools (11). This technique could be used in addition to, or in some cases instead of, computational methods for promoter prediction. This technique should be widely applicable to any bacterial species for which genetic cloning techniques have been established and gene reporters can be used. The outcome (and overall benefit) of using this method is that while obtaining information on required regions of DNA required for gene expression, the user also obtains usable gene reporter constructs that are then available for other applications.
The RAIL method is a straightforward cloning technique that allows the user to create a library of DNA fragments cloned into a plasmid backbone by using a combination of arbitrary PCR (12, 13) and isothermal DNA assembly (IDA; Gibson assembly [14]). While Hustmyer et al. used this method to clone a libraries of promoter fragments into vectors containing reporter genes for the purpose of creating reporter fusions that can be subsequently screened, this basic technique has the potential for numerous different uses (11). The RAIL method is performed in four steps, briefly outlined as follows. (i) Random PCR is performed by using one primer that specifically anneals to the DNA of interest (the upstream promoter region in reference 11) and a set of arbitrary primers that contain a 5′ linker that anneals to the plasmid backbone. (ii) Nested PCR is performed with the PCR products from step one as the template. In this step, a linker that anneals to the plasmid backbone is added to the primer corresponding to the other side of the amplified product, so that the resulting amplified DNA has homology to the plasmid on both ends. (iii) The plasmid backbone is linearized by either PCR or restriction digestion. (iv) The library of PCR products is inserted into the linearized vector by IDA. The resulting library of plasmids can then be transformed into the bacterial strain of interest and used for subsequent screening or other applications.
The utility of the RAIL method was elegantly demonstrated in the accompanying report, showing how the technique could be applied to delineate promoter boundaries experimentally (11). Hustmyer et al. used this technique to map the boundaries of promoters in Vibrio harveyi, a bioluminescent marine bacterium. They first examined the luxCDABE promoter region, which drives expression of the bioluminescence genes in V. harveyi. Previous studies indicated this to be a relatively large and undefined promoter, with multiple binding sites for transcription factors (15–17). Using the RAIL method, Hustmyer et al. generated libraries of the promoter region fused to gfp (green fluorescent protein), which were then sorted by flow cytometry to pool fractions expressing differing levels of fluorescence. Next-generation sequencing of these pools identified the 3′ boundaries of promoter DNA required for maximal expression. Subsequent studies showed the utility of this technique for identifying the 3′ boundaries of two additional promoters, each using different reporter genes. The RAIL method was again employed to provide promoter boundary information by using both the mCherry red fluorescent reporter and β-galactosidase (lacZ fusion). While Hustmyer et al. defined the 3′ boundaries of the promoter regions investigated in their study by using static 5′ primers, the same approach could be easily used to map the 5′ boundaries. Importantly, any constructs identified in these studies that are found to have optimal (native) expression levels of the reporter can subsequently be used for studies on genetic regulation of these pathways.
While Hustmyer et al. demonstrated the benefits of using RAIL to delineate promoter boundaries for specific genes, there should be numerous other potential applications for this technique. Obviously, this method could be used to investigate and characterize other less-well-characterized transcriptionally regulated sequences, such as small noncoding regulatory RNAs, responsible for considerable genetic regulation in bacteria. Other possible applications of this technique suggested by Hustmyer et al. include both cell and molecular biology studies. This cloning strategy could be used to generate functional fluorescent protein fusions useful for protein localization studies or perhaps to generate constructs for fluorescence resonance energy transfer experiments to study protein-protein interactions in living cells (18). Additional possible applications of RAIL include aspects of protein purification, either by inserting affinity tags into proteins of interest using the protocol or for identifying soluble fusion constructs that are highly expressed. Many other variations on these themes could be developed, with the only limitations being that the technique requires one primer that is anchored to a specific sequence of interest and any other limitations specific to the model system, such as usable reporters.
Another unique benefit of the RAIL strategy is that the library can be used for several purposes in the same set of studies. One could easily imagine a situation where a RAIL library is generated and screened by some method, defining a promoter region of interest. Subsequently, a fusion construct showing high promoter activity could be selected from that library for use as a reporter to ask specific questions about that promoter. An obvious next step would be to design a mutagenesis screen where transcription factors required for promoter activity could be identified. Theoretically, the original RAIL library could then be rescreened to identify regions of the promoter required for activation (or repression) by that newly identified transcription factor. Obviously, this would generate a huge amount of information from an initial single cloning experiment.
In summary, Hustmyer and colleagues have provided the field with a novel technique that should be very useful for many genetic systems. The versatility and simplicity of the RAIL method will likely make it an attractive option for promoter studies and a variety of other applications.
ACKNOWLEDGMENTS
Research in my laboratory is supported by the deArce-Koch Memorial Endowment Fund in Support of Medical Research and Development Program and by startup funds from the University of Toledo.
The views expressed in this Commentary do not necessarily reflect the views of the journal or of ASM.
Footnotes
For the article discussed, see https://doi.org/10.1128/JB.00724-17.
REFERENCES
- 1.Browning DF, Busby SJ. 2004. The regulation of bacterial transcription initiation. Nat Rev Microbiol 2:57–65. doi: 10.1038/nrmicro787. [DOI] [PubMed] [Google Scholar]
- 2.Borukhov S, Nudler E. 2003. RNA polymerase holoenzyme: structure, function and biological implications. Curr Opin Microbiol 6:93–100. doi: 10.1016/S1369-5274(03)00036-5. [DOI] [PubMed] [Google Scholar]
- 3.Janga SC, Collado-Vides J. 2007. Structure and evolution of gene regulatory networks in microbial genomes. Res Microbiol 158:787–794. doi: 10.1016/j.resmic.2007.09.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Gabrielian A, Bolshoy A. 1999. Sequence complexity and DNA curvature. Comput Chem 23:263–274. [DOI] [PubMed] [Google Scholar]
- 5.Kanhere A, Bansal M. 2005. Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Res 33:3165–3175. doi: 10.1093/nar/gki627. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Olivares-Zavaleta N, Jauregui R, Merino E. 2006. Genome analysis of Escherichia coli promoter sequences evidences that DNA static curvature plays a more important role in gene transcription than has previously been anticipated. Genomics 87:329–337. doi: 10.1016/j.ygeno.2005.11.023. [DOI] [PubMed] [Google Scholar]
- 7.Hertz GZ, Stormo GD. 1996. Escherichia coli promoter sequences: analysis and prediction. Methods Enzymol 273:30–42. doi: 10.1016/S0076-6879(96)73004-5. [DOI] [PubMed] [Google Scholar]
- 8.Umarov RK, Solovyev VV. 2017. Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PLoS One 12:e0171410. doi: 10.1371/journal.pone.0171410. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Mazin PV, Fisunov GY, Gorbachev AY, Kapitskaya KY, Altukhov IA, Semashko TA, Alexeev DG, Govorun VM. 2014. Transcriptome analysis reveals novel regulatory mechanisms in a genome-reduced bacterium. Nucleic Acids Res 42:13254–13268. doi: 10.1093/nar/gku976. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Sorek R, Cossart P. 2010. Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet 11:9–16. doi: 10.1038/nrg2695. [DOI] [PubMed] [Google Scholar]
- 11.Hustmyer CM, Simpson CA, Olney SG, Bochman ML, van Kessel JC. 2018. Promoter boundaries for the luxCDABE and betIBA-proXWV operons in Vibrio harveyi defined by the method rapid arbitrary PCR insertion libraries (RAIL). J Bacteriol 200:e00724-. doi: 10.1128/JB.00724-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Welsh J, McClelland M. 1990. Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res 18:7213–7218. doi: 10.1093/nar/18.24.7213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Williams JG, Kubelik AR, Livak KJ, Rafalski JA, Tingey SV. 1990. DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res 18:6531–6535. doi: 10.1093/nar/18.22.6531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Gibson DG, Young L, Chuang RY, Venter JC, Hutchison CA III, Smith HO. 2009. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat Methods 6:343–345. doi: 10.1038/nmeth.1318. [DOI] [PubMed] [Google Scholar]
- 15.Meighen EA. 1991. Molecular biology of bacterial bioluminescence. Microbiol Rev 55:123–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Miyamoto CM, Smith EE, Swartzman E, Cao JG, Graham AF, Meighen EA. 1994. Proximal and distal sites bind LuxR independently and activate expression of the Vibrio harveyi lux operon. Mol Microbiol 14:255–262. doi: 10.1111/j.1365-2958.1994.tb01286.x. [DOI] [PubMed] [Google Scholar]
- 17.van Kessel JC, Ulrich LE, Zhulin IB, Bassler BL. 2013. Analysis of activator and repressor functions reveals the requirements for transcriptional control by LuxR, the master regulator of quorum sensing in Vibrio harveyi. mBio 4:e00378-13. doi: 10.1128/mBio.00378-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sekar RB, Periasamy A. 2003. Fluorescence resonance energy transfer (FRET) microscopy imaging of live cell protein localizations. J Cell Biol 160:629–633. doi: 10.1083/jcb.200210140. [DOI] [PMC free article] [PubMed] [Google Scholar]