Abstract
We developed a method to produce, identify and analyze DNA fragments for the purpose of taxonomic classification. Genome profiling (GP) is a strategy that identifies genomic DNA fragments common to closely related species without prior knowledge of the DNA sequence. Random PCR, one of the key technologies of GP, is used to produce fragments and may be used even when there are mutations at the priming site. These fragments can then be distinguished based on the information of mobility and melting pattern when subjected to temperature gradient gel electrophoresis (TGGE). Corresponding fragments among several species, designated as commonly conserved genetic fragments (CCGFs), likely have the same genetic origin or correspond to the same gene. The criteria for identification of CCGFs has been defined and presented here. To assess this prediction, some of the fragments were sequenced and were confirmed to be CCGFs. We show that genome profiles bearing evolutionarily conserved CCGFs can be used to classify organisms and trace evolutionary pathways, among other profound applications.
INTRODUCTION
Traditionally, the classification of organisms is based on phenotypic traits. More recently, evolutionarily conserved gene and amino acid sequences have been employed to classify organisms. Ribosomal genes, among others (1–3), have been used for this purpose. However, targeting specific genes often fails to produce useful PCR products due to a mismatch in template-primer binding structures when the gene is not highly conserved (4). Moreover, this method requires prerequisite knowledge of the sequence. Therefore, we propose a method of genome profiling (GP) to identify commonly conserved genetic fragments (CCGFs) across a variety of organisms.
In this strategy, non-specific DNA fragments are generated through random PCR, resolved by electrophoresis, and characterized and identified by physical property analysis. DNA fragments that are common to various species and difficult to obtain by specific PCR can often be amplified by random PCR owing to the formation of mismatch- or bulge-containing primer-template structures performed at lower temperatures than that of specific PCR (5,6). The fragments generated by random PCR from a variety of organisms are resolved by temperature gradient gel electrophoresis (TGGE) utilizing not only fragment size, but also mobility, melting point and pattern characteristics. Fragments found in a wide range of species are CCGFs. Identifying CCGFs through GP is a cost-effective way to classify species. CCGFs also play a pivotal role in the pattern similarity score (PaSS) method (discussed later).
MATERIALS AND METHODS
Template DNA
Genomic DNA of four members of the enterobacteria family was obtained from the Saitama Institute of Public Health (Saitama, Japan): Escherichia coli, Shigella sp., Salmonella sp. and Yersinia sp. Genomic DNA of Bacillus was a gift from Dr Itaya (Mitsubishi Institute of Life Sciences, Japan).
Primers
Random PCR was carried out on four species of enterobacteria and Bacillus using primers, pfM12 (dAGAACGCGCCTG) and pfM19 (dCAGGGCGCGTAC). The resulting fragments are shown in Figure 1A and B, respectively. The primer pairs dAGAACGCGCCTGTTGCTGGAAGAG/dAGAACGCGCCTGTTCTTCTGATGT and dAGAACGCGCCTGTTTGAACAGCTG/dAGAACGCGCCTGTTCTGGCGTCA were used to amplify bands 1 and 3, respectively, of Figure 1A. The primer pairs dCAGGGCGCGTACCCCACCAGCACG/dCAGGGCGCGTACGTGGACGATTAC and CAGGGCGCGTACGACAGCATTGGT/dCAGGGCGCGTACACCAGTTCCGGG were used to amplify bands 2 and 5, respectively, appearing in the genome profiles in Figure 1B.
Figure 1.
CCGFs identified by GP. Genome profiles of five species (four enterobacteria and one Bacillus) were obtained using primers pfM12 (A) and pfM19 (B). DNA fragments identified as candidate CCGFs are indicated by the same numbers (the reds were processed to sequencing). Ref is an internal reference (204 bp, initial melting temperature = 60°C) that was comigrated with the random PCR products (14).
Designing specific primers
For obtaining sequences, we followed the previously described strategy (5), where the sequences of the random PCR products were predicted based on their sizes and confirmed from their melting profiles.
Genome profiling
This method consists of two principle techniques, random PCR and TGGE. Random PCR was carried out in 100 µl reaction volumes containing 10 ng of template DNA, 0.5 µM primer DNA, 250 µM each dNTP (N = A, G, C, T), 50 mM Tris–HCl pH 8.8, 15 mM (NH4)2SO4, 10 mM MgCl2, 0.45% Triton X-100, 200 µg/ml BSA and 2 U Taq DNA polymerase (Biotech International). Amplifications were carried out for 30 cycles of 30 s at 94°C, 2 min at 28°C and 2 min at 47°C in a PTC-100TM thermal cycler (MJ Research, MA). TGGE was performed for 75 min at 15 V/cm using 4% polyacrylamide gels with a temperature gradient of 30–70°C perpendicular to the electric field (6) on a TG-180 (Taitec, Japan).
Sequencing
DNA fragments 1 and 3 from Figure 1A and 2 and 5 from Figure 1B were recovered from the random PCR products by specific PCR using the designed primers, cloned into a plasmid using a TA cloning kit (Invitrogen), and sequenced on a DSQ 2000L sequencer (Shimadzu, Japan).
RESULTS AND DISCUSSION
A fundamental difference between GP and conventional methods is that genes used for classifying organisms are defined before the experiment in conventional approaches, whereas CCGFs are identified as a result of the GP experiments without prior knowledge of their sequences. In this paper we use four species of enterobacteria and Bacillus to demonstrate the effectiveness of GP to accomplish this task.
Random PCR generates a pool of unidentified DNA fragments depending on the combination of template and primer(s), which are representatives of the original genome (5). It can also be widely applied to detect polymorphisms in various organisms (7–12). Although convenient to perform, it is not sufficient for identifying the origins of bands mainly because: (i) the amount of random PCR products changes significantly depending on PCR conditions, leading to different appearance of band patterns (13); and (ii) insertions or deletions that occurred during the course of evolution alter the mobility of DNAs, resulting in quite different appearance. Unless bands common to various species are very close in position, it is difficult to determine whether such bands represent the same fragment. In other words, it is difficult to determine whether there are bands of common sources among different species unless the species are very close. On the other hand, GP can reveal that there are possible common bands between the species as shown in Figure 1 and indicated by the same numbers. Two genome profiles were obtained for each organism by changing the random PCR primers and multiple bands were observed in each profile. Evidently, no profile obtained by TGGE resembles each other, thus demonstrating that TGGE is more powerful than conventional gel electrophoresis for visualizing random PCR products (6). This power of separation comes from the fact that TGGE utilizes both the mobility and melting pattern of DNA (14), whereas conventional electrophoresis separates fragments based only on mobility (Fig. 2).
Figure 2.
TGGE can assign more CCGFs than the conventional gel electrophoresis. CCGFs are marked at the side of the corresponding bands. Species O, A, B and C are evolutionarily linked along time-scale, t. The crosses represent bands that were judged not to be CCGFs (TGGE patterns).
Identical band patterns observed by TGGE for two or more samples are very likely to represent the same part of the genome and are likely to have similar, if not identical, sequences. Conversely, the probability that DNA from different genes would produce similar TGGE patterns is very low. In order to elucidate this fact, four fragments appearing in different species (CCGFs) were recovered from the random PCR products using primers designed according to the predicted sequence and sequenced. Complete correspondence between the band patterns in the genome profiles and confirmation experiments was obtained with few exceptions (Fig. 3; Table 1).
Figure 3.
Confirmation of CCGFs by TGGE of isolated DNA obtained with the primers pfM12 (A) or pfM19 (B). Some of the candidate CCGFs (1, 2, 3 and 5 in Fig. 1) were recovered (right of the arrows) from the corresponding random PCR products (left of the arrows) and are shown with numbers in red. R is the internal reference also used in Figure 1.
Table 1. Summarized confirmation results for possible CCGF bands.
| |
|
Possible CCGF bands |
|||
|---|---|---|---|---|---|
| pfM12 | pfM19 | ||||
| |
|
Band 1a (336 bp) |
Band 3 (480 bp) |
Band 2 (247 bp) |
Band 5 (460 bp) |
| Enterobacteriaceae |
Escherichia |
True CCGF (18)b |
True CCGF (7) |
True CCGF (5) |
True CCGF (4) |
| |
Shigella |
True CCGF (18) |
True CCGF (9) |
True CCGF (1) |
True CCGF (4) |
| |
Yersinia |
False assigned |
True CCGF (14) |
True CCGF (9) |
True CCGF (14) |
| |
Salmonella |
Not assigned |
True CCGF (9) |
Not assigned |
False assigned |
| Bacillus | False assigned | False assigned | True CCGF (6) | True CCGF (9) | |
aThe same nomenclature is used for the bands as in Figure 1.
bThe number of point mutations observed by alignment of the sequences of CCGFs.
The first criteria for establishing CCGFs are that the mobility and the melting temperature of corresponding DNA bands be quite close. This can be quantitatively performed by way of species identification dots (spiddos) and PaSS (15). Spiddos are the featuring points on each individual band appearing in genome profiles and are assigned based on intrinsic melting points such as initial melting point, minimum mobility point and so on. Secondly, overall melting patterns of the bands must be similar. If the above criteria are fulfilled, the bands in different genome profiles can be tentatively designated as CCGFs. Statistical analysis will provide a more rigorous confirmation of CCGFs. For example, PaSS values obtained for pairs of tentatively identified CCGF bands, PaSSB in Figure 4, may be suitable for this purpose. In this case, the similar situation must hold true to that encountered in the analysis of PaSS scores used for species identification (Fig. 5).
Figure 4.
Calculation of PaSS for determination of CCGFs. PaSSB (a derivative of PaSS) for a single band can be calculated by the given equation using three spiddos marked as filled circles in the figure. A value of unity indicates a perfect match in the melting patterns of the two DNA fragments, which can be designated to be CCGFs.
Figure 5.
Calculation of PaSS values for random pairs. An average of 0.667 was obtained for pairs of randomly assigned spiddos with a similar value from the cumulative frequency curve. This graph was made with values from eight spiddos assigned randomly and serves as a reference for PaSSB.
This approach of finding CCGFs is more effective for genomes that are not so distantly related. Theoretically, we expect that organisms with the same genome size and descended from a common ancestor, for example, species within the same family or genus, would share more CCGFs. The sequences of the recovered fragments were determined and found to be genuine CCGFs (Table 1). Point mutation analysis reveals different degrees of substitution among the species studied here. Thus, bands showing similar or identical TGGE patterns in genome profiles are more likely to have similar sequences and correspond to the same part of a gene. Namely, it can be claimed that these fragments have a common origin. Thus, from the results demonstrated above, we designate these bands as CCGFs.
CCGFs can be classified into two categories as local CCGFs and global CCGFs. Local CCGFs are the ones that are present in some or a limited number of species whereas global CCGFs are the fragments that may be present universally in almost all of the species. There are probably CCGFs that are common to a wide range of species. Once such a wide-range CCGF (or global CCGF) is found, it will be useful for classifying species.
Classification of species based on CCGFs can be approached in two ways: direct sequencing of the CCGFs, or calculating PaSS values (15). By obtaining sequences of CCGFs the distances between species can be measured either in terms of Hamming distance [or genetic distance (16–19)] or genome distance (a derivative from PaSS) (15). The presence of CCGFs indicates that the species under investigation are closely located in genome sequence space. The PaSS values clearly indicate a similarity among the members of enterobacteria and clearly differentiate between Bacillus and the members of enterobacteria (Table 2). This result is in agreement with the results obtained by sequencing and the other conventional methods (Fig. 6).
Table 2. PaSS values generated between the different species.
| Escherichia | Salmonella | Shigella | Yersinia | Bacillus | |
|---|---|---|---|---|---|
|
Escherichia |
|
0.894 |
0.91 |
0.811 |
0.716 |
|
Salmonella |
|
|
0.917 |
0.824 |
0.724 |
|
Shigella |
|
|
|
0.847 |
0.748 |
| Yersinia | 0.793 |
Figure 6.
Comparison of the sequencing and genome distance approach. Phylogenetic trees of enterobacteriaceae were drawn based on point base substitutions (A) and genome distance (B). Figures beside the branches are the numbers of base-substitution or genome distance, respectively. From literature studies, it is known that Yersinia is distant to Salmonella and Escherichia (21), Escherichia and Salmonella are closely related (22,23) and Escherichia and Shigella are closely related (24).
The greatest advantage of the GP approach over conventional ones, which depend on the specific PCR, is its robustness in dealing with mutations. Specific PCR often fails to produce the target product in the presence of mutations at the primer-binding sites due to the high fidelity of primer binding under high annealing temperatures. On the other hand, random PCR uses lower annealing temperatures, tolerates mutations in the primer-binding step, and can therefore generate many more DNA fragments. In addition, we can still recognize CCGFs in spite of the insertion and deletion mutations within the relevant regions of DNA using TGGE. Identifying global CCGFs will facilitate evolutionary studies as these CCGFs can be used as universal probes much like 16S rDNA/RNA and gyrases (20). Therefore, GP and CCGFs are powerful new tools that can be utilized in classification and related studies, particularly when previous attempts to identify genes for taxonomic studies were not successful for the organisms in question.
Acknowledgments
ACKNOWLEDGEMENTS
We thank M. Itaya for cooperation and T. Watanabe for technical assistance. This study was supported in part by a Grant-in-Aid (09272203) from the Ministry of Education, Science, Sports and Culture of Japan. M.N. was supported by Japan Society for the Promotion of Science (13001147).
REFERENCES
- 1.Woese C.R., Kandler,O. and Wheelis,M.L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria and Eukarya. Proc. Natl Acad. Sci. USA, 87, 3140–3145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Yamamoto S. and Harayama,S. (1998) Phylogenetic relationships of Pseudomonas putida strains deduced from the nucleotide sequences of gyrB, rpoD and 16S rRNA genes. Int. J. Syst. Bacteriol., 48, 813–819. [DOI] [PubMed] [Google Scholar]
- 3.Yamada S., Ohashi,E., Agata,N. and Venkateswaran,K. (1999) Cloning and nucleotide sequence analysis of gyrB of Bacillus cereus, B. thuringiensis, B. mycoides and B. anthracis and their application to the detection of B. cereus in rice. Appl. Environ. Microbiol., 65, 1483–1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Arnheim N. and Erlich,H. (1992) Polymerase chain reaction strategy. Annu. Rev. Biochem., 61, 131–156. [DOI] [PubMed] [Google Scholar]
- 5.Nishigaki K., Saito,A., Hasegawa,T. and Naimuddin,M. (2000) Whole genome sequence-enabled prediction of sequences performed for random PCR products of Escherichia coli. Nucleic Acids Res., 28, 1879–1884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Nishigaki K., Naimuddin,M. and Hamano,K. (2000) Genome profiling: a realistic solution for genotype-based identification of species. J. Biochem., 128, 107–112. [DOI] [PubMed] [Google Scholar]
- 7.Nishigaki K., Amano,N. and Takasawa,T. (1991) DNA profiling—an approach of systemic characterization, classification and comparison of genomic DNAs. Chem. Lett., 1991, 1097–1100. [Google Scholar]
- 8.Williams J.G., Kubelik,A.R., Livak,K.J., Rafalski,J.A. and Tingey,S.V. (1990) DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res., 18, 6531–6535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Welsh J. and McClelland,M. (1990) Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res., 18, 7213–7218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Akopyanz N., Bukanov,N.O., Westblom,T.U., Kresovich,S. and Breg,D.E. (1992) DNA diversity among clinical isolates of Helicobacter pylori detected by PCR-based RAPD fingerprinting. Nucleic Acids Res., 20, 5137–5142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Tibayrenc M., Neubauer,K., Barnabe,C., Guerrini,F., Skarecky,D. and Ayala,F.J. (1993) Genetic characterization of six parasitic protozoa: parity between random-primer DNA typing and multilocus enzyme electrophoresis. Proc. Natl Acad. Sci. USA, 90, 1335–1339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Welsh J., Perucho,M., Peinado,M., Ralph,D. and McClelland,M. (1995) Fingerprinting of DNA and RNA using arbitrarily primed PCR. In McPherson,M.J., Hames,B.D. and Taylor,G.R. (eds), PCR 2: A Practical Approach. IRL Press, Oxford, UK, pp. 197–218.
- 13.Vos P., Hogers,R., Bleeker,M., Reijans,M., van de Lee,T., Hornes,M., Frijters,A., Pot,J., Peleman,J., Kuiper,M. and Zabeau,M. (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res., 23, 4407–4414. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Wartell R.M., Hosseini,S., Powell,S. and Zhu,J. (1998) Detecting single base substitutions, mismatches and bulges in DNA by temperature gradient gel electrophoresis and related methods. J. Chromatogr. A, 806, 169–185. [DOI] [PubMed] [Google Scholar]
- 15.Naimuddin M., Kurazono,T., Zhang,Y., Watanabe,T., Yamaguchi,M. and Nishigaki K. (2000) Species-identification dots: a potent tool for developing genome microbiology. Gene, 261, 243–250. [DOI] [PubMed] [Google Scholar]
- 16.Nei M. and Takezaki,N. (1996) Genetic distances and reconstruction of phylogenetic trees from microsatellite DNA. Genetics, 144, 389–399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Barnabas J., Goodman,M. and Moore,G.W. (1972) Descent of mammalian alpha globin chain sequences investigated by the maximum parsimony method. J. Mol. Biol., 69, 249–278. [DOI] [PubMed] [Google Scholar]
- 18.Nei M. and Chakraborty,R. (1973) Genetic distance and electrophoretic identity of proteins between taxa. J. Mol. Evol., 2, 323–328. [DOI] [PubMed] [Google Scholar]
- 19.Tateno Y., Nei,M. and Tajima,F. (1982) Accuracy of estimated phylogenetic trees from molecular data. 1. Distantly related species. J. Mol. Evol., 18, 387–404. [DOI] [PubMed] [Google Scholar]
- 20.Gutell R.R., Larsen,N. and Woese,C.R. (1994) Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. Microbiol. Rev., 58, 10–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pabbaraju K. and Sanderson,K.E. (2000) Sequence diversity of intervening sequences (IVSs) in the 23S ribosomal RNA in Salmonella spp. Gene, 253, 55–66. [DOI] [PubMed] [Google Scholar]
- 22.Lawrence J.G. and Ochman,H. (1998) Molecular archeology of the Escherichia coli genome. Proc. Natl Acad. Sci. USA, 95, 9413–9417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Reid S.D., Herbelin,C.J., Bumbaugh,A.C., Selander,R.K. and Whittam,T.S. (2000) Parallel evolution of virulence in pathogenic Escherichia coli. Nature, 406, 64–67. [DOI] [PubMed] [Google Scholar]
- 24.Pupo G.M., Lan,R. and Reeves,P.R. (2000) Multiple independent origins of Shigella clones of Escherichia coli and convergent evolution of many of their characteristics. Proc. Natl Acad. Sci. USA, 97, 10567–10572. [DOI] [PMC free article] [PubMed] [Google Scholar]








