Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2004 Jun 21;101(26):9780–9785. doi: 10.1073/pnas.0400745101

Microarray analysis of transposition targets in Escherichia coli: The impact of transcription

Dipankar Manna *, Adam M Breier , N Patrick Higgins *,
PMCID: PMC470751  PMID: 15210965

Abstract

Transposable elements have influenced the genetic and physical composition of all modern organisms. Defining how different transposons select target sites is critical for understanding the biochemical mechanism of this type of recombination and the impact of mobile genes on chromosome structure and function. Phage Mu replicates in Gram-negative bacteria using an extremely efficient transposition reaction. Replicated copies are excised from the chromosome and packaged into virus particles. Each viral genome plus several hundred base pairs of host DNA covalently attached to the prophage right end is packed into a virion. To study Mu transposition preferences, we used DNA microarray technology to measure the abundance of >4,000 Escherichia coli genes in purified Mu phage DNA. Insertion hot- and cold-spot genes were found throughout the genome, reflecting >1,000-fold variation in utilization frequency. A moderate preference was observed for genes near the origin compared to terminus of replication. Large biases were found at hot and cold spots, which often include several consecutive genes. Efficient transcription of genes had a strong negative influence on transposition. Our results indicate that local chromosome structure is more important than DNA sequence in determining Mu target-site selection.

Keywords: bacteriophage Mu


Mobile genetic elements constitute a significant portion of many genomes. For example, in Escherichia coli, they represent 1% of the genome (1) and make up nearly one quarter of the genome of the Gram-positive bacterium Enterococcus faecalis (2). A variety of transposable elements make up about half of the genome in higher eukaryotes like humans (3). Many studies demonstrate the significant impact of mobile elements on genome organization and evolution. In bacteria, transposons facilitate formation of DNA inversions, deletions, and chromosome fusions (4). These agents also promote prokaryotic horizontal gene transfer, one clinically significant example being the dissemination of antibiotic-resistant modules among disparate groups of bacteria (for review, see ref. 5).

Transposition is tightly controlled, because unregulated amplification would have severe consequences for the bacterial host. Transposition activity is regulated by modulation of the expression and activity of the transposase and by factors influencing target-site selection (6). Experiments focusing on both aspects of this regulation are essential to understanding the dynamic interactions between transposons and the host genome.

Among bacterial transposons, phage Mu is the most efficient, on average producing 100 phage particles per cell at the time of cell lysis (for reviews, see refs. 7 and 8). Previous studies showed that Mu is quite promiscuous in its selection of integration targets. Based on hundreds of integration sites, a loose consensus for the duplicated 5-bp target site (N-Y-S-R-N) has been derived. However, anecdotal information on specific target regions suggests that Mu integration is far from random. For example, most Mu integration events in the lamB gene encoding the phage λ receptor were restricted to the early part of the gene (9). Based on unselected Mu insertions, transposition hot spots have been identified at the transcriptional control region of the bgl operon involved in β-glucoside utilization (10). Earlier studies from our laboratory showed that the transposition profile in the lac operon is affected by the binding of the LacI repressor, and that an AT-rich patch near the bgl operon exerts a strong influence on both location and orientation of Mu insertions (10, 11).

Materials and Methods

Bacterial Strains, Phages, and Media. The E. coli strain used in this study was N99 (F- rpsL galK), a derivative of E. coli K12. Bacteriophage MupAp1 is a derivative of Mu carrying a temperature-sensitive repressor (cts62) and a substitution of 1.1 kb of Mu DNA with a Tn3 fragment conferring ampicillin resistance (12). Liquid cultures were grown in LB (13). Ampicillin was used at a final concentration of 50 μg/ml.

Preparation of Bacterial Genomic DNA and Mu DNA. Bacterial genomic DNA was prepared from cultures grown to mid-log phase in LB, as described (10). Mu phage was produced by thermoinduction of Mu lysogens. Mid-log cultures were shifted to 42°C with shaking until lysis occurred. Phage particles were purified as described (14). Mu phage was further purified by using a glycerol step gradient (from bottom; 1.5 ml of 40% glycerol/1.5 ml of 5% glycerol/2 ml of phage suspension) spun at 35,000 rpm at 4°C for 1 h in a SW50.1 rotor in Beckman ultracentrifuge. The phage pellet was resuspended in SM buffer (11). Mu DNA was isolated from purified phage particles as described (11).

Fluorescence Labeling of DNA. To aid labeling by random priming, genomic DNA was sheared to ≈2-kbp fragments by using a Branson Sonifier 450 (Branson) with output control setting at 5, duty cycle setting of constant, and a sonication burst of 10 sec. Two micrograms of sheared DNA was mixed with 20 μl of 2.5× reaction buffer mix from the RadPrime DNA labeling system (Invitrogen) and boiled for 10 min before chilling on ice for 10 min. Each labeling reaction of 40 μl contained 5 μl of 10× dNTP mix (1.2 mM each of dATP, dTTP, and dGTP, and 0.6 mM dCTP), 3 μl of 1 mM Cy3-dCTP (Amersham Pharmacia Biosciences), and 2 μl (50 units total) of the Klenow fragment of DNA polymerase I (Invitrogen). Reactions were carried out at 37°C for 2 h and stopped by addition of 5 μl of EDTA (0.5 M).

Host DNA attached to the right end of Mu phage was fluorescently labeled by linear amplification by using a Mu-specific DNA primer (MuR) and Taq polymerase. The MuR oligonucleotide sequence is 5′-T TCGCAT T TATCGTGAAACGCTTTC (GenBank accession no. AF083977). Reaction mixtures contained Cy-5 dCTP (3 μl of 1 mM stock); 5 μl of 10× PCR buffer without Mg (Sigma); MgCl2 (2.5 mM); Mu phage DNA (2 μg); MuR primer (5 μl of 10 mM stock); 0.2 mM each of dATP, dTTP, and dGTP; 0.1 mM dCTP; and 5 units of Taq polymerase (Sigma) in a 50-μl reaction volume. Amplification was carried out in a thermocycler by using steps of 1 min at 55°C, 1 min at 94°C, and 2 min at 72°C repeated 30 times. The labeled DNA probes were purified by using a microcon 30 filter (Amicon/Millipore).

Hybridization and Washing of Microarray. Hybridization mixtures contained 16 μl of purified labeled DNA, 3.4 μl of 20× SSC, and 0.6 μl of 10% SDS. Hybridization reactions were heated to 95°C for 2 min, cooled to room temperature, and applied to microarray slides under a lifter slip. Hybridization was carried out at 55°C for 16 h. Slides were washed once in 300 ml of buffer I (1× SSC, 0.1% SDS) at 45°C and twice in 300 ml of buffer II (0.1× SSC) at room temperature for 5 min each. Arrays were scanned with a genepix 4000a scanner (Axon Instruments, Union City, CA) and processed with genepix pro 4.0 software (Axon Instruments).

Data Manipulation and Normalization. Of more than a dozen slides, five were chosen for detailed analysis in this study. The few features with saturated pixels were excluded. After local background subtraction, raw data from genepix files were exported to an excel (Microsoft) spreadsheet for manipulation. Cy3 readings corresponded to genomic DNA signals and Cy5 to Mu packaged host DNA signals. Approximately 200 genes had either Cy3 or Cy5 fluorescence intensity readings <100. Of these, 165 genes had significant Cy3 readings (400-13,000) but Cy5 readings <100. Features <100 in Cy5 intensity readings and <400 in the Cy3 channel were excluded. For analysis, the “ratio of medians” values were chosen. Replicate features were merged by using gepas (15). A Lowess fit was applied by using gprocessor 2.0a (http://bioinformatics.med.yale.edu/softwarelist.html) developed by Zhong Guan (Yale University, New Haven, CT). The parameters for the Lowess fit were f = 0.2 and n = 3. Data were normalized by setting the median value of “ratio of medians” for each individual array to one.

Results

Measuring in Vivo Mu Transposition Target Selection. Bacteriophage Mu transposes to multiple new sites within each cell during lytic growth. After replication, Mu genomes are put into virions by a head-full packaging mechanism. First, bacterial DNA is cut 50-150 bp outside the Mu left end, and linear DNA is reeled into a virus particle. Head space allows a viral genome plus 1-2 kb of adjoining host DNA to enter a particle followed by a second double-strand cleavage of bacterial DNA and virus maturation. One milliliter of phage lysate (2 × 109 plaque-forming units/ml) contains a quantity of virus such that Mu has had an opportunity to integrate into every base pair in the E. coli genome 1,000 times over. Thus, the host DNA component of virion DNA (5% of the total) represents a statistically significant sampling of Mu target selection during lytic phage development.

To generate a genome-wide transposition profile, we used DNA microarrays to measure the relative abundance of all gene sequences attached to the Mu right end. Previous work from this laboratory showed that for different thermo-induced lysogens, transposition to specific loci was not influenced by the starting position of a Mu prophage, except for strong underutilization of a narrow zone (20 kb) adjacent to the original Mu prophage insertion site (16). To eliminate this bias, phage lysates were made by thermo-inducing cultures inoculated with a pool of 100 randomly selected Mu lysogens. We competitively hybridized host DNA attached to the Mu right end and chromosomal DNA from a nonlysogen to microarrays. DNA microarrays included PCR products corresponding to 4,290 of 4,405 annotated E. coli ORFs (NC_000913). Although this array lacks intergenic sequences, the average intergenic length is 118 bp with <50 being longer than 600 bp. With an average length of host DNA attached to Mu of 1.5 kb, this approach is expected to cover >99% of the genome.

The “ratio” value of a gene (fluorescence intensity of Cy5 probe over Cy3 probe) is a measure of abundance of that gene sequence in Mu DNA and hereafter is termed the relative Mu transposition target preference, or TTP. In all, TTP values for 4,245 genes were obtained representing 99% of the total ORFs present on the slide, and 96% of the annotated E. coli ORFs. A scatter plot of log-scaled TTP values from five independent experiments shows that the variation for most genes was low (Fig. 1). Standard error for 75% of the genes was <12% of their TTP value, and for 95% of the genes it was <25% of TTP. The average TTP value for each gene from five slides was used in subsequent analysis. Processed datasets from five arrays can be found in Table 2, which is published as supporting information on the PNAS web site.

Fig. 1.

Fig. 1.

Reproducibility of Mu TTP measurements. Relative Mu TTP for various target genes from each of five microarray slides were plotted against the average of five measurements. Each color represents data from one array slide. TTP values were determined and normalized as described in Materials and Methods.

Significance analysis of microarrays (17) was used to search for genes whose TTP value differed significantly from the median. Using the most stringent setting (q = 0.0001) in which the expected false positive rate is 1 per 10,000, TTP values >1.321 or <0.758 were significant. Of genes passing these criteria, 537 were excluded because of high variability between arrays (280 were higher and 257 were lower than the median). Of the remaining genes tested, 1,056 and 973 were significant as hot and cold genes, respectively.

Hot and Cold Target Genes Throughout the Genome and Gene Classes. Mu TTP values varied from 0.049 to 264, showing more than three orders of magnitude difference in transposition frequency between cold- and hot-spot genes. Log scaled transposition preference followed a normal distribution (Fig. 2). Most genes (3,246 of 4,245; 76%) had TTP values within 2-fold (above or below) of the median value of 1.0, whereas 95% of the genes (4,015 of 4,245) fell within 4-fold of the median value. Forty-nine genes (1%) had TTP values <4-fold below median, whereas 181 (4%) genes had TTP values >4. A cumulative distribution of TTP values indicated that ≈40% of Mu transpositions are targeted to only 4% of genes.

Fig. 2.

Fig. 2.

Distribution of relative Mu TTP. The number of target genes within a given range of relative Mu TTP value has been plotted as a histogram. Each interval of log2(TTP) value was 0.2. Median value of TTP was set to 1. (Inset) Distribution of genes that are transposition hot spots.

To develop a picture of genome-wide distribution of Mu transposition, TTP values for all genes were plotted against their location (beginning nucleotide number) (Fig. 3). The positions of replication origin and terminus were marked on this plot. Transposition products were found throughout the genome without a gross bias toward any part of the chromosome. Transposition hot-spot genes with TTP 4-fold above median and cold-spot genes with TTP 4-fold below median were scattered throughout the chromosome. A gradual reduction in overall transposition was found from the origin to the terminus in both replichores (Fig. 3; also see below).

Fig. 3.

Fig. 3.

Mu transposition preference for individual genes as a function of chromosomal position. Relative Mu TTP for individual genes was plotted as a function of starting nucleotide number of the gene in the E. coli genome (NC_000913). Genes that have been determined to be essential for viability are marked in red, whereas prophage and phage-related genes are yellow. Positions of replication ori and ter are marked on abscissa.

Transposition target bias could be the result of preference for certain DNA sites, differential accessibility of DNA to DNA-binding proteins in general, or a spatial separation of specific chromosomal regions from Mu transposition machinery. A significant but weak correlation (correlation coefficient 0.21, P < 0.0001) was observed between the frequency of the 5-bp Mu target consensus (NYSRN) in the gene and its TTP value (Fig. 7, which is published as supporting information on the PNAS web site). A cluster of orthologous groups-based distribution of 500 most and 500 least preferred transposition target genes exhibited no strong bias toward any functional gene category (Table 3, which is published as supporting information on the PNAS web site). The 25 most and 25 least preferred target genes are listed in Table 1.

Table 1. A list of 25 most preferred and 25 least preferred Mu target genes.

Hot spots
Cold spots
ID Gene TTP ID Gene TTP
b3684 yidP 264.327 b3629 rfaS 0.049
b0865 ybjP 215.658 b2035 wbbH 0.052
b2349 intC 179.247 b1505 b1505 0.077
b0864 artP 157.010 b1368 b1368 0.093
b4209 ytfE 129.489 b1363 trkG 0.096
b0207 yafB 116.234 b1139 lit 0.106
b3395 yrfD 112.752 b0557 ybcU 0.112
b4210 ytfF 93.842 b3624 rfaZ 0.114
b2827 thyA 81.334 b1721 b1721 0.122
b2593 yfiH 76.993 b0555 ybcS 0.123
b4042 dgkA 73.970 b1719 thrS 0.123
b3410 yhgG 72.047 b1458 b1458 0.129
b3399 yrfG 70.507 b2034 wbbl 0.140
b4322 uxuA 64.395 b1718 infC 0.143
b2920 ygfH 50.841 b4279 yjhB 0.156
b0863 artl 48.270 b1156 ycfA 0.160
b4161 yjeQ 34.036 b1145 b1145 0.161
b3409 feoB 31.542 b2369 evgA 0.171
b0208 yafC 29.963 b1245 oppC 0.178
b4323 uxuB 29.888 b1243 oppA 0.180
b4180 yjfH 28.120 b1142 ymfH 0.180
b2672 ygaM 27.778 b1503 b1503 0.184
b2592 clpB 27.549 b1048 mdoG 0.185
b4208 cycA 25.160 b3623 rfaK 0.187
b0365 tauA 24.537 b0544 ybcK 0.188

For hot spots, genes are in descending order of TTP value; for cold spots, genes are in ascending order of TTP value. ID, identification.

One gene category with consistently low TTP values was prophage and phage-related genes (18) (marked yellow in the plot of transposition preference; Fig. 3). Transposition data for 93 of 94 phage and phage-related genes were obtained. All had TTP values <2 except for rhsE (TTP value of 4.725) and intC (TTP value of 179). The median TTP value for all phage-related genes was 0.695, making them the only genetically distinct group showing a consistent pattern. Phage gene TTP values differed significantly from nonphage genes (P = 6.7 × 10-10, Wilcoxon rank sum test).

Of the 252 documented essential genes (see Profiling of E. coli Chromosome web site, www.shigen.nig.ac.jp/ecoli/pec/index.jsp), transposition preference data for 232 genes were obtained in this study (marked red in the plot of transposition preference; Fig. 3). These genes often appear as consecutive points in the plot, because they are part of an operon. Although a subtle influence is not excluded by the present analysis, most essential genes seem to provide little to no effect on target bias. The median essential gene TTP value was 0.780, but this probably reflects a higher percentage of essential genes being efficiently transcribed compared to total genes (see below).

A Mu Bias from Replication Origin to Terminus. The E. coli chromosome replicates bidirectionally from the oriC to the terminus near the recombination sequence dif (19, 20). Transposon Tn7 preferentially targets the terminus of replication (21). In contrast, Mu appeared more frequently in genes near the origin. This trend is clearly discernible when the genes are arranged in order from origin to terminus and a sliding median of Mu TTP values for replichores I and II is superimposed (Fig. 4). The drop in TTP at the terminus was ≈2-fold as compared to genes near the origin.

Fig. 4.

Fig. 4.

Mu transposition frequency as a function of chromosomal replication. A moving median of TTP for individual genes with a 301-gene window was plotted as a function of target gene distance from oriC. Genes belonging to replichore I, replicated clockwise from ori to ter, are marked blue, whereas genes belonging to replichore II, replicated anticlockwise from ori to ter, are marked red. The dotted line is the linear regression fit to all of the data points.

Mu TTP Values Are Inversely Related to Target Gene Expression. Previous studies by our lab and others suggested that some highly transcribed genes are poor targets for Mu transposition (10, 11, 22, 23). To test the generality of this correlation, microarray data of mRNA abundance (24, 25) were compared with the TTP scores for Mu transposition.

Most E. coli genes are poorly expressed. Transcript abundance values of log phase cells grown in LB, hereafter termed transcript copy number (TSC), were available for 4,282 genes (25). An induced lac operon has a TSC value of 10, and uninduced lacZ has a TSC number considerably less than 0.2. Only 14 protein-encoding genes (<1%) had TSC values of 10 or greater. A scatter plot of Mu target preference vs. transcript abundance indicates a strong inverse relationship (Fig. 5A). Among genes with TTP values <0.5 (449 genes), half had transcript numbers >1. Genes with TSC scores between 2 and 10 (499 genes) had a median TTP value of 0.796, and genes with TSC numbers between 10 and 25 (13 genes) had a median TTP score of 0.412.

Fig. 5.

Fig. 5.

Inverse correlation between Mu TTP and target gene expression level. (A) Relative Mu TTP for individual genes were plotted against steady-state TSC of that gene. TSC values are from a previously published report (27). Inset expands the view of genes with low to moderate levels of transcription. (B) A moving median of TTP and TSC with a window of 101 adjacent genes are plotted against the location of each gene. Plot of TTP is in red, whereas the plot of TSC is in blue.

Conversely, Mu preferred poorly transcribed genes. Twenty of the top 24 transposition hot spots with TTP values >25 had TSC values <1. Although Mu transposition displayed a strong preference for poorly expressed genes and avoided highly transcribed genes, no simple mathematical relationship between transcript abundance and TTP score was apparent.

Previous studies have indicated the existence of transcriptional domains in E. coli (A. Khodursky, personal communication; refs. 26 and 27). To see whether the transposition target choice coincides to transcription “domains,” we plotted the median value of TSC and TTP with a moving window of 101 adjacent genes (Fig. 5B). Both transcription and transposition values could be grouped into domains of high or low activity. The peaks of transcription correlated with the troughs of transposition and vice versa. Thus, Mu transposition responds to these transcriptional domains, preferring the transcriptionally inactive regions of the chromosome.

Hot and Cold Gene Clusters. A gene with a high TTP score was frequently adjacent to other high TTP value genes. Clusters accounted for 66% (120 of 181) of genes with TTP values >4. Altogether, 49 hot-spot clusters with two or more genes were identified. Several hot-spot clusters (see yrfG and ytfE) consisted of 5-10 consecutive genes spanning >5 kb. Hot-spot clusters had two distinct patterns.

In the first class, transposition preference for genes surrounding a hot spot dropped gradually away from the hot spot, e.g., the cluster around the hot spot ytfE (Fig. 6A). In the second class, TTP values were asymmetric, dropping sharply on one side and gradually on the other, such as the cluster around the ybjP hotspot (Fig. 6B). One explanation for an asymmetric hot cluster is a group of efficient target sites in which Mu left-right orientation is biased. An example in the bgl operon shows a cluster of unselected Mu insertions in a transcriptional control region that all having the same orientation (10).

Fig. 6.

Fig. 6.

Clustering of transposition hot and cold spots. Mu TTP values were plotted against the coordinate (nucleotide number of the midpoint) of the target gene. Transpositions around the hot spots ytfE (A) and ybjP (B) were distributed symmetrically and asymmetrically respectively. Transpositions in the cluster of cold spots increased gradually, away from the least preferred gene cydD (C).

Most cold-spot clusters were efficiently transcribed (TSC >2.0) and highly transcribed operons were some of the largest cold regions in the chromosome. Ten cold-spot clusters had TTP values 4-fold below median. For example, the 14 rfa genes formed a cold-spot cluster with 13 TTP values ranging from 0.891 to 0.049 (a TTP value for one gene was unavailable). When genes with TTP values 2-fold lower than median were examined, many more clusters were observed. A cold-spot cluster centered on cydD is shown in Fig. 6C.

Discussion

Most data on Mu target selection have come from in vitro experiments (28-30), genetic studies of single operons (9, 31), a few select chromosomal locations (10), or plasmid targets (32). However, the answer to three major questions remained unanswered using previous approaches. First, how random is “Mu random transposition” at the genome level (33); second, where are major transposition hot- and cold-spot genes; and last, what factor(s) regulate transposition target selection?

Here we developed a genomic map of Mu target preference by determining the relative abundance of sequences in the host DNA attached to Mu. We propose that the observed Mu target preference (TTP score) reflects differential DNA accessibility to Mu transposition machinery. One alternative explanation is that the packaging of phage DNA into mature phage particles after transposition can proceed preferentially from certain sites. We discount this explanation from data obtained in a PCR-based technique called Muprinting, which was developed to analyze in vivo transposition patterns at nucleotide resolution (11). Transposition patterns from many specific regions studied with packaged phage DNA have proven indistinguishable from transposition patterns of whole genomic DNA isolated before cell lysis (refs. 10, 11, and 16; unpublished data). These results indicate that little if any preferential packaging occurs.

At our current level of resolution, ≈80% of the genome is within a factor of two of the median in transposition efficiency. This result fits with the generally accepted use of Mu and Mu-derived elements to create transposon libraries and uncover regulatory circuits (34, 35). However, hot and cold spots that differ from the median >4-fold, which represent 5% of the chromosome, were distributed throughout the genome. Four distinct patterns in our transposition map suggest specific molecular mechanisms of modulating access to Mu transposition proteins. These include: (i) a significant 2-fold transposition frequency bias that is highest at the oriC sequence and lowest at the dif site on the chromosome, (ii) hot spots >4-fold above average that represent 4% of the genome and usually occur in gene clusters, (iii) cold spots representing 1% of the genome composed of highly transcribed operons, and (iv) cold spots representing <1% of the genes that are rarely transcribed but include prophage and phage-like genes.

Mu has a 2-fold preference for genes near the replication origin. In cells growing rapidly in LB, new rounds of DNA synthesis begin at oriC before cell division takes place. This results in 2- to 4-fold more copies of origin proximal genes compared to genes at the terminus (36, 37). During a lytic infection cycle, Mu replicates by two mechanisms. First, replicative transposition makes new copies of Mu that become distributed genome wide, and Mu replicas become passively duplicated by chromosomal synthesis with each round of initiation that occurs at oriC. This double replication process appears to lead to a 2-fold overabundance of Mu replicas in genes near oriC relative to genes near dif.

Transposition hot spots with TTP values ranging from 4- to >200-fold above median accounted for ≈4% of chromosomal genes. That most hot spots contain a cluster of genes argues for a mechanism that is broader than a simple sequence preference for target selection. Hot-spot clusters exhibited two patterns, one symmetric and the other asymmetric (Fig. 6 A and B). Hot spots might indicate regions of DNA that have an unusual structure like a mismatch (38) or that associate with a cellular protein with affinity for Mu transposition proteins. During Mu transposition, the MuB protein mediates the target capture step. Recent in vitro studies (30, 39) with λ phage DNA demonstrate that MuB has preferred binding sites, and these are preferred transposition targets. Interestingly, MuB binds cooperatively, forming long polymers along DNA that spread from primary binding sites. A/T-rich DNA sequences have been noted to cause a hot spot and to orient Mu transposition events (10). Because MuB-binding sites are the primary determinants in establishing a preferred transposition zone in vitro (30), a detailed microarray map of MuB-binding sites in vivo (40) could be used to test this mechanism.

The most obvious factor affecting transposition was transcription. Efficient transcription had a clear negative impact on transposition, and this effect was seen statistically (Fig. 5A) as well as in a global comparison of transposition TTP score and transcription activity (Fig. 5B). Strongly expressed genes, such as ribosomal genes, were very poor targets for transposition. On the other hand, most transposition hot spots were genes that are poorly expressed under the experimental conditions used for this work. The clearest and largest cold-spot clusters were highly transcribed operons (Fig. 6C). However, we did not find a direct correlation between transcript level and TTP. This may reflect a limitation of the Mu transposition screening method. Because Mu packages 1-2 kb of adjoining host DNA, genes adjacent to the target gene are packaged into virion particles even when they contain no internal transposition sites. Large well-transcribed operons were easily identified, and genes toward the middle had lower TTP values than genes at the border. High-level transcription clearly insulates many genes from Mu transposition. The biochemical mechanism could involve displacement of Mu transposition proteins (MuA and MuB) by transcribing RNA polymerases.

Transcription was not the only mechanism for generating cold-spot clusters. Some poorly expressed genes were very poor transposition targets. For example, prophage genes were poor targets as a class, even though they were not transcribed. Many prophage genes are toxic, and the mechanism that keeps them from transcription may also shield them from transposition. The question of DNA accessibility in bacteria has been difficult to study systematically. Hildebrandt and Cozzarelli (41) used two site-specific recombination systems, the λ Int and phage P1 lox-Cre systems, to measure the accessibility of plasmid DNA in vivo. The high concentration of DNA inside a living cell was expected to greatly facilitate access to enzymes like site-specific recombinases and transposition proteins. Surprisingly, these authors found that plasmid DNA was 10-fold less accessible in vivo than expected from in vitro results (41). Why plasmid DNA is inaccessible in vivo has never been adequately explained.

One protein that sequesters DNA is the SeqA protein. SeqA binds a 4-bp sequence of GATC with a strong preference for hemimethylated sequences (42). SeqA organizes DNA as it emerges from replication fork (43, 44), transiently protecting it from Dam methylation (45, 46). Recent results show that strains lacking SeqA are extremely sick (47), that the global transcription rate is increased, and that the normal pattern of transcription is strikingly changed (26). How SeqA inhibits transcription is not understood. Perhaps SeqA favors binding of regulatory proteins over RNA polymerase so that expression of many genes is prevented. Microarray-based approaches are well suited to address such global DNA accessibility problems in bacteria.

Supplementary Material

Supporting Information
pnas_101_26_9780__.html (1.2KB, html)

Acknowledgments

We thank Katherine Scheirer (Genomics Core Facility of the University of Alabama at Birmingham Heflin Center for Human Genetics) and Lisa Postow (University of California, Berkeley) for help and advice with the microarray production and slide processing. We also thank members of the Higgins laboratory for critical reading of the manuscript. This work was supported by National Science Foundation Grant MCB9122048 (to N.P.H.) and National Institutes of Health Grant GM031655 (to N. R. Cozzarelli). A.M.B. was supported by a Howard Hughes Medical Institute Predoctoral fellowship.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviations: TTP, transposition target preference; TSC, transcript copy number.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information
pnas_101_26_9780__.html (1.2KB, html)
pnas_101_26_9780__2.html (4.7KB, html)
pnas_101_26_9780__1.pdf (13.4KB, pdf)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES