Abstract
In sequenced genomes of prokaryotes, anomalous DNA (aDNA) can be recognized, among others, by atypical clustering of dinucleotides. We hypothesized that atypical clustering of hexameric endonuclease recognition sites in aDNA allows the specific isolation of anomalous sequences in vitro. Clustering of endonuclease recognition sites in aDNA regions of eight published prokaryotic genome sequences was demonstrated. In silico digestion of the Neisseria meningitidis MC58 genome, using four selected endonucleases, revealed that out of 27 of the small fragments predicted (<5 kb), 21 were located in known genomic islands. Of the 24 calculated fragments (>300 bp and <5 kb), 22 met our criteria for aDNA, i.e. a high dinucleotide dissimilarity and/or aberrant GC content. The four enzymes also allowed the identification of aDNA fragments from the related Z2491 strain. Similarly, the sequenced genomes of three strains of Escherichia coli assessed by in silico digestion using XbaI yielded strain-specific sets of fragments of anomalous composition. In vitro applicability of the method was demonstrated by using adaptor-linked PCR, yielding the predicted fragments from the N.meningitidis MC58 genome. In conclusion, this strategy allows the selective isolation of aDNA from prokaryotic genomes by a simple restriction digest–amplification–cloning–sequencing scheme.
INTRODUCTION
Horizontal gene transfer (HGT) was already identified in 1944 by the same experiment that demonstrated the transformation of non-virulent to virulent Streptococcus pneumoniae (1). The extent of HGT as an evolutionary phenomenon had not been addressed quantitatively on genomic scale until Lawrence and Ochman (2) calculated that ∼18% of the genome of Escherichia coli MG1665 was horizontally transferred since its divergence from the Salmonella lineage 100 million years ago. This identified HGT as a major factor in prokaryotic genome evolution. Recently, an extensive database of horizontally transferred genes based on complete bacterial and archaeal genomes has been made available (3).
The rationale behind the computational identification of horizontally transferred DNA is the genome hypothesis, which proposes that for a given prokaryotic genus genomic DNA is relatively constant in codon usage and GC content (4,5). In contrast, horizontally acquired anomalous DNA differs in codon usage and/or GC composition from the recipient genome and can therefore be identified when substantial sequence information is available.
An additional parameter in lateral genomics is based on oligonucleotide compositional extremes: the dinucleotide relative abundance values or genome signature ρ* (6,7). The genome signature is constant among members of a genus, but deviates substantially between members of different genera (8). When used for intragenomic comparisons, ρ* makes an excellent parameter for the identification of anomalous DNA regions. Aberrant dinucleotide frequencies in aDNA are then expressed as the genome dissimilarity δ*, being the average dinucleotide relative abundance difference between the aDNA region and the whole genome (6–8). Although the genome signature is capable of identifying clusters of alien genes and acquired pathogenicity-associated islands (PAI) with an atypical nucleotide composition, highly expressed regions such as ribosomal clusters can also display aberrant dinucleotide frequencies (8,9).
Till date, to our knowledge, no method exists that uses (one of) these parameters and enables the selective isolation of anomalous DNA sequences from a microbial genome in vitro. In order to develop such a technique, we investigated a special group of oligonucleotide composition extremes: the local overrepresentation in a genome of palindromic hexanucleotide sequences, specifically restriction endonuclease recognition sites, in aDNA regions. Like the genomic dinucleotide and tetranucleotide frequencies (10,11), frequencies of restriction sites vary between the genomes of different microbial species (12). Avoidance of cognate recognition sequences is probably the operating mechanism (13,14). An HGT event between different organisms may introduce clusters of certain restriction sites in the recipient's genome. Therefore, digestion of the chromosomal DNA with such a restriction endonuclease can produce a limited number of small restriction fragments, comprising potential anomalous DNA, which can be selectively amplified by adaptor-linked PCR [ALP (15)]. The resulting amplicons can be subsequently subcloned and identified by sequence analysis.
Clustering of restriction endonuclease recognition sites in diverse aDNA regions in prokaryotic genomes was illustrated by the in silico assessment of seven genome sequences of five different species. The restriction enzymes for which the hexameric recognition sites are underrepresented were identified for each genome, and restriction fragments between clustered sites, being <5 kb, were analysed for nucleotide composition concerning GC percentage and genomic dissimilarity.
Next, the restriction fragments of Neisseria meningitidis MC58 between 300 bp and 5 kb were analysed in silico for both GC content and genome signature compared to the genomic values. Also, the restriction fragments obtained with the selected restriction endonucleases from N.meningitidis MC58 and Z2491 strains were compared.
Finally, in order to demonstrate the applicability of this technique in vitro, ALP was performed on chromosomal DNA from strain MC58 digested by each of the selected restriction endonucleases. The resulting amplicons were sequenced to verify the predicted sequence composition.
MATERIALS AND METHODS
Bacterial strain and growth conditions
N.meningitidis MC58 is a serogroup B:15:P1.7,16 strain isolated from a case of invasive infection in the UK (16). This wild-type MC58 strain lacks the erythromycin resistance cassette insertion in the capsule gene locus in contrast to the sequenced strain MC58 (17). Neisseriae were grown on heated blood (chocolate) agar plates or in liquid Tryptic Soy Broth (DIFCO) medium at 37°C in a humidified atmosphere of 5% CO2.
Chromosomal DNA preparation and digestion
Chromosomal DNA was isolated with the Puregene DNA isolation kit (Biozym). Restriction digests and subsequent heat inactivation were carried out according to the manufacturer's instructions (Roche).
Adaptor-linked PCR and DNA sequencing
Adaptor-linked PCR was performed as described previously (18). The adaptor and linker sets are MP19 (5′-ACG TCG ACT ATC CAT GAA CAG ATC 3′) and MP23 (5′-GAT CTG TTC ATG-3′) for the ScaI-digested genomic template, MP24 (5′-ACC GAC GTC GAC TAT CCA TGA ACA-3′) and MP20 (5′- CTA GTG TTC ATG -3′) for both the NheI- and SpeI-digested chromosomal DNA and MP24 and MP23 for the BglII-digested genomic template. PCR amplicons were purified by agarose gel extraction (Qiagen) and subcloned into a pCR2.1 vector (Invitrogen) according to the manufacturer's instructions. E.coli DH5α was transformed by standard heat shock procedure. The constructed plasmids were isolated with the Wizard Kit (Promega). Inserts were sequenced using standard M13 primers or primer walking on vector or genomic DNA according to the manufacturer's instruction (ABI). Sequences were analysed using the Staden Package (http://www.mrc-lmb.cam.ac.uk/pubseq/).
Software
The restriction site frequency tables from the various genomes were obtained from http://tools.neb.com/~posfai/FINISHED. The in silico digestions of the various sequenced genomes (for accession numbers see Table 1) were performed using the Restriction Digest tool from The Institute for Genomic Research (TIGR) (http://www.tigr.org). In silico retrieval and identification of the restriction fragments was performed with the Position Search/Segment Retrieval tool from TIGR (http://www.tigr.org). The different genomes of N.meningitidis were compared using the Artemis Comparison Tool (ACT) (http://www.sanger.ac.uk).
Table 1. Clustering of restriction enzyme recognition sites in anomalous DNA regions in sequenced genomes of various prokaryotes.
Organism | Accession number (and reference) | Enzyme | Total number of fragmentsa | aDNAb |
---|---|---|---|---|
Haemophilus influenzae Rd20 | NC000907 (29) | XmaIII | 7 | 7/7 |
ApaI | 7 | 7/7 | ||
E.coli O157:H7 VT2 | NC002695 (30) | AvrII | 5 | 5/5 |
XbaI | 3 | 2/3 | ||
E.coli K-12 | NC000913 (31) | XbaI | 2 | 1/1 |
E.coli CFT073 | NC004431 (32) | XbaI | 7 | 4/5 |
Salmonella enterica serovar Typhi CT18 | NC003198 (33) | XbaI | 7 | 6/6 |
Methanobacterium thermoautotrophicum delta H | NC000916 (34) | SpeI | 11 | 7/8 |
N.meningitidis MC58 | NC003112 (17) | BglII | 9 | 6/7 |
ScaI | 11 | 9/10 | ||
SpeI | 3 | 3/3 | ||
NheI | 4 | 4/4 | ||
N.meningitidis Z2491 | NC003116 (35) | BglII | 4 | 1/2 |
ScaI | 4 | 3/4 | ||
NheI | 2 | 2/2 | ||
Total | 67/74 (91%) |
aAll restriction fragments up to 5 kb are considered in this column.
bFor aDNA composition calculations concerning GC percentage and genome dissimilarity, only the fragments between 300 bp and 5 kb were considered.
Data analysis
Fragments were designated anomalous in GC composition if the GC content of the fragment is below the fifth or above the 95th percentile of the genomic GC content distribution, calculated with a window and step size identical to the fragment length (http://www.tigr.org).
The δ* value for each restriction fragment was calculated as described earlier by Karlin and colleagues (7). In brief, the dinucleotide relative abundance values ρXY* are defined as the frequency of the dinucleotide XY divided by the product of the background frequencies of the individual nucleotides in the sequence and the reverse complement sequence [ρ*XY = fXY/(fX * fY)]. δ* is the average absolute dinucleotide relative abundance difference given by δ*(f,g) = 1/16 * ∑|ρXY*(f)−ρXY*(g)|, where ρXY * (f) denotes the abundance values calculated for fragment f and ρXY * (g) the abundance values calculated for the genome g. The δ* of each fragment was compared to a distribution of δ* values which we constructed for consecutive fragments of identical size obtained from the respective genome sequence. A fragment was scored positive for anomalous DNA composition if this δ* value was above the 90th percentile of the δ* distribution. For determination of genome signature dissimilarities, only sequences between 300 bp and 5 kb were considered. Five kilobase pairs is the median size of imported DNA in Neisseria (19), and it also represents a conservative size limit for technical convenience in amplification procedures. With restriction fragments <300 bp, computations of composition are not performed; this limit represents a conservative lower size limit previously used in studies identifying aDNA by codon usage (3,5).
RESULTS
Local overrepresentation of hexameric restriction enzyme recognition sites in anomalous DNA regions of sequenced prokaryotic genomes
In order to identify clustered restriction endonuclease recognition sites in the sequenced prokaryotic genomes used in this study, we tested restriction enzymes of which the hexapalindromic recognition sites are underrepresented in the genome sequences (http://tools.neb.com/~posfai/FINISHED). The tendency of the recognition sites to cluster in aDNA regions was assessed by analysing the sequence composition of the restriction fragments obtained in silico (Table 1 and supplementary Tables 1–8). Fragments <300 bp were not considered for their genome dissimilarity and GC composition values, because δ* values of these small fragments are unreliable; this conservative minimal length is also used by Lawrence and Ochman and Garcia-Vallvé and co-workers (3,5). Nevertheless, many of these restriction fragments <300 bp are adjacent to the other fragments <5 kb in their respective genomes (as an example see N.meningitidis MC58 in Table 2 and Figure 1). The results showed that the eight analysed genome sequences did contain clusters of endonuclease restriction sites in aDNA regions (Table 1). The aggregated data showed 74 fragments (lengths between 300 bp and 5 kb) of which 67 (91%) were of anomalous composition.
Table 2. Restriction fragment numbers, lengths, δ* and GC composition of fragments obtained after in silico digestion of the genome of N.meningitidis MC58 by four selected restriction enzymes.
Fragmentsa | GC content | Genomic dissimilarity | ||||
---|---|---|---|---|---|---|
Enzyme | No. | Length (bp) | GC% | <10th percentile | δ* (× 103) | >90th percentile |
BglII | 1 | 2996b | 47 | − | 136 | + |
2 | 2889b | 49 | − | 117 | + | |
3 | 2889c | 49 | − | 117 | + | |
4 | 2654c | 46 | − | 123 | + | |
5 | 2461 | 51 | − | 136 | + | |
6 | 1194c | 55 | − | 125 | − | |
7 | 477 | 34 | + | 218 | + | |
8 | 75b | ND | ND | ND | ND | |
9 | 21d | ND | ND | ND | ND | |
NheI | 1 | 4723e | 43 | + | 99 | + |
2 | 4392f | 40 | + | 111 | + | |
3 | 787e | 35 | + | 123 | − | |
4 | 670 | 29 | + | 282 | + | |
ScaI | 1 | 4824g | 43 | + | 121 | + |
2 | 4452g | 48 | − | 142 | + | |
3 | 2496 | 48 | − | 132 | + | |
4 | 2179 | 57 | − | 85 | − | |
5 | 865h | 38 | + | 171 | + | |
6 | 699h | 38 | + | 218 | + | |
7 | 600 | 50 | − | 184 | + | |
8 | 600h | 50 | − | 184 | + | |
9 | 600h | 50 | − | 182 | + | |
10 | 533h | 51 | − | 192 | + | |
11 | 67h | ND | ND | ND | ND | |
SpeI | 1 | 1672 | 36 | + | 132 | + |
2 | 579i | 35 | + | 212 | + | |
3 | 470 | 24 | + | 274 | + |
For fragments <300 bp the GC percentage and δ* were not determined (ND).
aOut of 25 fragments, 17 were located within one of the anomalous gene clusters A, B or C described by Karlin (8).
bAdjacent in anomalous gene cluster A.
cAdjacent in anomalous gene cluster C.
dPresent in anomalous gene cluster B.
eAdjacent in anomalous gene cluster A.
fPresent in anomalous gene cluster C.
gPresent in anomalous gene cluster A.
hAdjacent in anomalous gene cluster C.
iAdjacent in anomalous gene cluster B.
Clustering of hexameric restriction enzyme recognition sites in the genome of N.meningitidis MC58 in different aDNA regions
Assessment of the occurrence of low-frequency restriction sites in the genome sequence of N.meningitidis MC58 revealed that many of the recognition sites of BglII, NheI, ScaI and SpeI clustered in the four regions known to contain large stretches of aDNA. These are also annotated as islands of horizontal transfer (IHT), thereby supporting the notion that in MC58 these recognition sites are relatively overrepresented in regions originating from horizontal transfer (3,8,17) (Figure 1, Table 2).
Of the 27 restriction fragments <5 kb, 21 were located within either of the clusters of anomalous genes described in previous studies (3,8,17) (Figure 1, Table 2). Various ScaI fragments as well as BglII fragments were adjacent to each other in these aDNA regions, confirming the local overrepresentation of recognition sites in these regions.
Calculation of GC composition and δ* values of the 24 restriction fragments between 300 bp and 5 kb obtained by in silico digestion with BglII, NheI, ScaI or SpeI confirmed their anomalous nature; 22 out of 24 of the restriction fragments met our criteria for anomalous DNA (Table 2). Of the 24 fragments, 21 had δ* values above that of the 90th percentile of the genomic δ* value distribution and 11 had a GC percentage lower than that of the fifth percentile of the genomic GC content distribution.
Comparing the different restriction fragment patterns of the sequenced N.meningitidis strains Z2491 and MC58 in silico
The restriction fragment patterns obtained in silico from the two different N.meningitidis strains showed remarkable differences (Table 3). Various restriction fragments located in the annotated anomalous gene clusters or IHTs in N.meningitidis MC58 were not identified in N.meningitidis Z2491, consistent with the notion that these IHTs are absent in strain Z2491 (17). In addition, two anomalous restriction fragments from MC58 (MC58-ScaI-2179 and MC58-SpeI-470), which were not part of one of the previously mentioned IHTs, were located in aDNA regions only present in MC58. The MC58-ScaI-2179 fragment harboured ORF NMB1829, encoding a TonB-dependent receptor, and MC58-SpeI-470 contained a cluster of six open reading frames (ORFs). The latter showed a number of features typical for a PAI (20), such as an atypical GC content compared to the genome sequence and association with a transfer RNA (tRNA) gene (NMB1595) and an insertion sequence IS1106 (ORF NMB1601) at its boundaries. As the functions of these ORFs and their distribution in other pathogenic and non-pathogenic strains are unknown, this region does not formally qualify as PAI, although a heterologous origin is suspected.
Table 3. Restriction fragments, coordinates, of the different tested N.meningitidis strains, indicating the absence or presence of the different fragments in the other strain (GI refers to the different genomic island as depicted in Figure 1).
Enzyme | Size | Coordinates | GI in strain MC58 | Presence in the other strain | ||
---|---|---|---|---|---|---|
Strain MC58 | Strain Z2491 | Strain MC58 | Strain Z2491 | |||
BglII | — | 4213 | — | 1179114–1183327 | — | Dispersed in MC58, with inversions and loss of restriction fragment |
2996 | — | 525422–528418 | — | A | Absent in Z2491 | |
2889 | — | 1863020–1865909 | — | C | Largely present in Z2491 | |
2889 | — | 522533–525422 | — | A | Largely present in Z2491 | |
2654 | — | 1860366–1863020 | — | C | Largely absent in Z2491 | |
2461 | 2449 | 726967–729428 | 875412–877861 | — | Similar sequences | |
1194 | — | 1859172–1860366 | — | C | Largely present in Z2491 | |
477 | — | 614379–614856 | — | — | Absent in Z2491 | |
— | 97 | 578316–578413 | — | Absent in MC58 | ||
75 | 75 | 542107–542182 | 688499–688574 | A | Similar sequences | |
21 | — | 1444731–1444752 | — | B | Absent in Z2491 | |
NheI | 4723 | — | 505027–509750 | — | A | Largely absent in Z2491 |
4392 | — | 1834407–1838799 | — | C | Absent in Z2491 | |
787 | 776 | 2231113–2231900 | 299657–300433 | X | Similar sequences | |
670 | 670 | 543262–543932 | 689470–690140 | A | Similar sequences | |
ScaI | 4824 | — | 526521–531345 | — | A | Largely absent in Z2491 |
4452 | — | 511598–516050 | — | A | Absent in Z2491 | |
— | 4101 | — | 769714–773815 | — | Partial similarity to MC58-ScaI-533, MC58-ScaI-600abc, partially absent in MC58 | |
— | 3181 | — | 1928315–1931496 | — | Similar sequences, but polymorphism at the recognition site | |
2496 | — | 1007047–1009543 | — | — | Largely present in Z2491 | |
2179 | — | 1925725–1927904 | — | — | Largely absent in Z2491 | |
865 | 865 | 1815665–1816530 | 1927450–1928315 | C | Similar sequences | |
699 | 699 | 1814966–1815665 | 1926751–1927450 | C | Similar sequences | |
600 | — | 1447767–1448367 | — | B | Similar to Z2491-ScaI-4101 | |
600 | — | 1447167–1447767 | — | B | Similar to Z2491-ScaI-4101 | |
600 | — | 616671–617271 | — | — | Similar to Z2491-ScaI-4101 | |
533 | — | 1446567–1447100 | — | B | Similar to Z2491-ScaI-4101 | |
67 | — | 1447100–1447167 | — | B | Similar to Z2491-ScaI-4101 | |
SpeI | 1672 | — | 2223993–2225665 | — | X | Dispersed in MC58, with inversions and loss of restriction fragment |
579 | — | 1443931–1444510 | — | B | Absent in Z2491 | |
470 | — | 1659795–1660265 | — | — | Absent in Z2491 |
The two strains were compared in silico using the Artemis Comparison Tool, with megablast hit scores of 500 and above.
Two fragments identified in silico in Z2491 were absent in the genome of MC58. Z2491-BglII-97 harboured a part of NMA0604, which encodes a hypothetical protein. The Z2491-ScaI-4101 fragment contained ORFs NMA0785 and NMA0786, encoding hypothetical proteins. Both NMA0785 and NMA0786 display an atypical GC composition and dinucleotide composition, and are described as putatively horizontally transferred by Garcia-Vallvé and colleagues (3).
Thus, the same set of four enzymes which was used to isolate anomalous sequences from the MC58 strain in silico identified aDNA fragments from the related strain Z2491 strain. Similarly, the sequenced genomes of three strains of E.coli assessed by in silico digestion using XbaI yielded strain-specific sets of fragments with anomalous composition (supplementary Tables 2–4). Unfortunately, due to sequence ambiguities in the E.coli EDL933 genome sequence, the δ* values of the XbaI restriction fragments from this strain could not be readily calculated, although the low GC percentage of these fragments compared to the E.coli genomic GC composition values suggest an anomalous nucleotide composition (supplementary Table 5).
Selective isolation of aDNA in vitro from N.meningitidis MC58 by adaptor-linked PCR
In order to validate that this strategy could be converted into an in vitro strategy with possible applications to unsequenced genomes, chromosomal DNA of strain MC58 was digested in vitro with BglII, NheI, ScaI or SpeI. The fragments obtained from each of the four digests were amplified by ALP. The amplicon pattern is very similar to the expected in silico restriction fragment patterns (Figure 2), albeit the minor differences observed. These can be explained by the possible inefficient amplification of large fragments (∼4 kb) in the presence of smaller fragments. The resulting amplicons were subcloned and sequenced, verifying the sequences predicted by the in silico analysis (data not shown). This demonstrated the applicability of this method in vitro.
DISCUSSION
A new parameter based on dinucleotide composition extremes has been introduced to identify genomic islands in complete genomes (6,7). The potential of this and other in silico methods to identify genomic islands is obviously limited to sequenced genomes. To our knowledge, no in vitro method exists which allows the selective isolation of aDNA from unsequenced genomes, except for subtractive hybridization strategies in which usually two related but different strains are compared (21–23). In order to develop an in vitro tool for the selective isolation of anomalous sequences from unsequenced genomes, we investigated whether clustering of restriction enzyme recognition sites could lead to the preferential isolation of aDNA from various sequenced genomes.
We demonstrated clustering of genomically underrepresented restriction enzyme recognition sites in eight sequenced genomes of five prokaryotic species in silico. We found that clustering of these recognition sites occurred predominantly in aDNA regions, including ribosomal loci, but also and more interestingly, in putative horizontally transferred loci which were described by Garcia-Vallvé and co-workers (3). However, some discrepancies between our data and their database exist, as the HGT database ignores non-coding sequences.
In the genome of N.meningitidis MC58, the clustering of the four selected endonuclease recognition sites occurred in the three known IHTs (17), a recently described aDNA region (3), and also in smaller anomalous loci. Comparative analysis of the calculated restriction fragments from N.meningitidis MC58 and Z2491 showed that similar putative horizontally acquired anomalous sequences could be isolated from both strains. Furthermore, aDNA confined to either one of these strains could be identified, suggesting that differences between strains can be identified and isolated. The strategy was validated by the in vitro amplification of the predicted restriction fragments from the genome of N.meningitidis MC58.
In this study, only a limited number of sequenced genomes was analysed to illustrate atypical clustering of endonuclease recognition sites in their respective aDNA regions. Theoretically, any prokaryotic genome may contain atypical clustering of endonuclease recognition sites. On the other hand, aDNA which is acquired via horizontal gene transfer is thought to adjust to the host's nucleotide composition over time in a process called amelioration, the same mutational process that affects the entire genome (5). This implies that only aDNA resulting from evolutionary recent transfer events, in which the nucleotide content of the acquired DNA still differs substantially from the sequence composition of the host genome, can be adequately identified and isolated. Furthermore, in genomes of bacteria, such as Helicobacter pylori, with an extreme plasticity due to high recombination rates, regions of aDNA may be rapidly obscured over time (24). Another potential limitation of our technique is that the restriction enzyme recognition sites may be methylated by restriction-modification (RM) systems. For example, H.pylori contains many RM enzymes, rendering the genome resistant to their activity (25).
Only hexapalindromic recognition sites of endonucleases have been tested; we did not examine other recognition sites (such as non-palindromic recognition sites). The cores of the restriction sites of the selected restriction enzymes persistently consist of the genomically underrepresented tetranucleotides previously described by Karlin and co-workers (10). Genomic underrepresentation of tetrapalindromes may be due to structural defects caused by these sequences or special functional roles associated with these sequences (10). Whether genomic aDNA regions, with these tetranucleotides overrepresented, predominantly originate form donor organisms in which these tetranucleotides are less associated with structural defects or special functional roles, remains unclear.
The restriction enzymes, for which the recognition sites were often found to cluster atypically in the genomes assessed in this study, such as SpeI, XbaI, AvrII and NheI, are also commonly used for genotyping by pulsed-field gel electrophoresis (PFGE) (26). For example, to identify an E.coli O157 outbreak cluster, Tsuji and co-workers (27) performed PFGE with the XbaI enzyme. A higher prevalence of the recognition sites of these enzymes in aDNA regions, such as horizontally transferred genes, may partly explain the high differentiating capacity of PFGE when performed with these enzymes. Insertion of horizontally acquired DNA harbouring these sites in a higher frequency than the recipient genome will result in the introduction of novel small fragments, which are usually not visualized by PFGE. However, the large fragment in which the region of aDNA is inserted will disappear from the PGFE pattern.
As the genome signature is conserved between closely related species (28), this technique may enable the selective isolation of aDNA from novel outbreak strains in the population of a pathogenic species of which a representative complete genome sequence is available, illustrated by the identification of the different anomalous fragments in the two Neisseria strains. It would be of interest to test different neisserial genoclusters for anomalous sequences with this novel strategy. In conclusion, the strategy presented in this study allows the selective isolation of anomalous sequences from prokaryotic genomes by a simple restriction digest–amplification–cloning–sequencing scheme. This simple technique can have major practical applications in studying horizontal gene transfer.
SUPPLEMENTARY MATERIAL
Supplementary Material is available at NAR Online.
Acknowledgments
ACKNOWLEDGEMENTS
We would like to thank Drs Mark Achtman and Christina Vandenbroucke-Grauls for critically reading the manuscript.
REFERENCES
- 1.Avery O.T., MacLeod,C.M. and McCarty,M. (1944) Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Inductions of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exp. Med., 79, 137–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lawrence J.G. and Ochman,H. (1998) Molecular archaeology of the Escherichia coli genome. Proc. Natl Acad. Sci. USA, 95, 9413–9417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Garcia-Vallvé S., Guzman,E., Montero,M.A. and Romeu,A. (2003) HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Res., 31, 187–189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Grantham R., Gautier,C., Gouy,M., Mercier,R. and Pave,A. (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res., 8, r49–r62. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Lawrence J.G. and Ochman,H. (1997) Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol., 44, 383–397. [DOI] [PubMed] [Google Scholar]
- 6.Burge C., Campbell,A.M. and Karlin,S. (1992) Over- and under-representation of short oligonucleotides in DNA sequences. Proc. Natl Acad. Sci. USA, 89, 1358–1362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Karlin S., Ladunga,I. and Blaisdell,B.E. (1994) Heterogeneity of genomes: measures and values. Proc. Natl Acad. Sci. USA, 91, 12837–12841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Karlin S. (2001) Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes. Trends Microbiol., 9, 335–343. [DOI] [PubMed] [Google Scholar]
- 9.Karlin S., Campbell,A.M. and Mrazek,J. (1998) Comparative DNA analysis across diverse genomes. Annu. Rev. Genet., 32, 185–225. [DOI] [PubMed] [Google Scholar]
- 10.Karlin S., Mrazek,J. and Campbell,A.M. (1997) Compositional biases of bacterial genomes and evolutionary implications. J. Bacteriol., 179, 3899–3913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Pride D.T., Meinersmann,R.J., Wassenaar,T.M. and Blaser,M.J. (2003) Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res., 13, 145–158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Roberts R.J., Vincze,T., Posfai,J. and Macelis,D. (2003) REBASE: restriction enzymes and methyltransferases. Nucleic Acids Res., 31, 418–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Gelfand M.S. and Koonin,E.V. (1997) Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucleic Acids Res., 25, 2430–2439. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Karlin S., Burge,C. and Campbell,A.M. (1992) Statistical analyses of counts and distributions of restriction sites in DNA sequences. Nucleic Acids Res., 20, 1363–1370. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Saunders R.D., Glover,D.M., Ashburner,M., Siden-Kiamos,I., Louis,C., Monastirioti,M., Savakis,C. and Kafatos,F. (1989) PCR amplification of DNA microdissected from a single polytene chromosome band: a comparison with conventional microcloning. Nucleic Acids Res., 17, 9027–9037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.McGuinness B.T., Clarke,I.N., Lambden,P.R., Barlow,A.K., Poolman,J.T., Jones,D.M. and Heckels,J.E. (1991) Point mutation in meningococcal por A gene associated with increased endemic disease. Lancet, 337, 514–517. [DOI] [PubMed] [Google Scholar]
- 17.Tettelin H., Saunders,N.J., Heidelberg,J., Jeffries,A.C., Nelson,K.E., Eisen,J.A., Ketchum,K.A., Hood,D.W., Peden,J.F., Dodson,R.J. et al. (2000) Complete genome sequence of Neisseria meningitidis serogroup B strain MC58. Science, 287, 1809–1815. [DOI] [PubMed] [Google Scholar]
- 18.Bowler L., Bart,A. and Van der Ende,A. (2001) Meningococcal Disease: Methods and Protocols. Humana Press, Totowa, NJ. [Google Scholar]
- 19.Linz B., Schenker,M., Zhu,P. and Achtman,M. (2000) Frequent interspecific genetic exchange between commensal Neisseriae and Neisseria meningitidis. Mol. Microbiol., 36, 1049–1058. [DOI] [PubMed] [Google Scholar]
- 20.Hacker J., Blum-Oehler,G., Muhldorfer,I. and Tschape,H. (1997) Pathogenicity islands of virulent bacteria: structure, function and impact on microbial evolution. Mol. Microbiol., 23, 1089–1097. [DOI] [PubMed] [Google Scholar]
- 21.Lisitsyn N., Lisitsyn,N. and Wigler,M. (1993) Cloning the differences between two complex genomes. Science, 259, 946–951. [DOI] [PubMed] [Google Scholar]
- 22.Bart A., Dankert,J. and van der Ende,A. (2000) Representational difference analysis of Neisseria meningitidis identifies sequences that are specific for the hyper-virulent lineage III clone. FEMS Microbiol. Lett., 188, 111–114. [DOI] [PubMed] [Google Scholar]
- 23.Malloff C.A., Fernandez,R.C. and Lam,W.L. (2001) Bacterial comparative genomic hybridization: a method for directly identifying lateral gene transfer. J. Mol. Biol., 312, 1–5. [DOI] [PubMed] [Google Scholar]
- 24.Suerbaum S., Smith,J.M., Bapumia,K., Morelli,G., Smith,N.H., Kunstmann,E., Dyrek,I. and Achtman,M. (1998) Free recombination within Helicobacter pylori. Proc. Natl Acad. Sci. USA, 95, 12619–12624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Kong H., Lin,L.F., Porter,N., Stickel,S., Byrd,D., Posfai,J. and Roberts,R.J. (2000) Functional analysis of putative restriction-modification system genes in the Helicobacter pylori J99 genome. Nucleic Acids Res., 28, 3216–3223. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.McClelland M., Jones,R., Patel,Y. and Nelson,M. (1987) Restriction endonucleases for pulsed field mapping of bacterial genomes. Nucleic Acids Res., 15, 5985–6005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Tsuji H., Hamada,K., Kawanishi,S., Nakayama,A. and Nakajima,H. (2002) An outbreak of enterohemorrhagic Escherichia coli O157 caused by ingestion of contaminated beef at grilled meat-restaurant chain stores in the Kinki District in Japan: epidemiological analysis by pulsed-field gel electrophoresis. Jpn. J. Infect. Dis., 55, 91–92. [PubMed] [Google Scholar]
- 28.Karlin S. and Burge,C. (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends Genet., 11, 283–290. [DOI] [PubMed] [Google Scholar]
- 29.Fleischmann R.D., Adams,M.D., White,O., Clayton,R.A., Kirkness,E.F., Kerlavage,A.R., Bult,C.J., Tomb,J.F., Dougherty,B.A., Merrick,J.M. et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269, 496–512. [DOI] [PubMed] [Google Scholar]
- 30.Hayashi T., Makino,K., Ohnishi,M., Kurokawa,K., Ishii,K., Yokoyama,K., Han,C.G., Ohtsubo,E., Nakayama,K., Murata,T. et al. (2001) Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res., 8, 11–22. [DOI] [PubMed] [Google Scholar]
- 31.Blattner F.R., Plunkett,G.,III, Bloch,C.A., Perna,N.T., Burland,V., Riley,M., Collado-Vides,J., Glasner,J.D., Rode,C.K., Mayhew,G.F. et al. (1997) The complete genome sequence of Escherichia coli K-12. Science, 277, 1453–1474. [DOI] [PubMed] [Google Scholar]
- 32.Welch R.A., Burland,V., Plunkett,G.,III, Redford,P., Roesch,P., Rasko,D., Buckles,E.L., Liou,S.R., Boutin,A., Hackett,J. et al. (2002) Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc. Natl Acad. Sci. USA, 99, 17020–17024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Parkhill J., Dougan,G., James,K.D., Thomson,N.R., Pickard,D., Wain,J., Churcher,C., Mungall,K.L., Bentley,S.D., Holden,M.T. et al. (2001) Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18. Nature, 413, 848–852. [DOI] [PubMed] [Google Scholar]
- 34.Smith D.R., Doucette-Stamm,L.A., Deloughery,C., Lee,H., Dubois,J., Aldredge,T., Bashirzadeh,R., Blakely,D., Cook,R., Gilbert,K. et al. (1997) Complete genome sequence of Methanobacterium thermoautotrophicum deltaH: functional analysis and comparative genomics. J. Bacteriol., 179, 7135–7155. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Parkhill J., Achtman,M., James,K.D., Bentley,S.D., Churcher,C., Klee,S.R., Morelli,G., Basham,D., Brown,D., Chillingworth,T. et al. (2000) Complete DNA sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature, 404, 502–506. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.