Abstract
Two computerized restriction fragment length polymorphism pattern analysis systems, the BioImage system and the GelCompar system (Molecular Analyst Fingerprinting Plus in the United States), were compared. The two systems use different approaches to compare patterns from different gels. In GelCompar, a standard reference pattern in one gel is used to normalize subsequent gels containing lanes with the same reference pattern. In BioImage, the molecular sizes of the fragments are calculated from size standards present in each gel. The molecular size estimates obtained with the two systems for 12 restriction fragments of phage λ were between 97 and 101% of their actual sizes, with a standard deviation of less than 1% of the average estimated size for most fragments. At the window sizes used for analysis, the GelCompar system performed somewhat better than BioImage in identifying visually identical patterns generated by electrophoretic separation of HhaI-restricted DNA of Listeria monocytogenes. Both systems require the user to make critical decisions in the analysis. It is very important to visually verify that the systems are finding all bands in each lane and that no artifacts are being detected; both systems allow manual editing. It is also important to verify results obtained in the pattern matching or clustering portions of the analysis.
Electrophoresis-based fingerprinting methods are now routinely used for epidemiological typing of bacteria and fungi. The first methods used included analysis of whole-cell or cell envelope protein patterns and classical DNA fingerprinting methods such as restriction enzyme analysis (REA), ribotyping, and insertion sequence fingerprinting. Later, macrorestriction analysis of genomic DNA by the use of low-frequency-cutting restriction enzymes and pulsed-field gel electrophoresis was introduced. Most recently, PCR-based methods such as random amplification of polymorphic DNA, typing by amplification of genomic DNA between repetitive extragenetic palindromic sequences, and amplified DNA restriction analysis have been described (6). These electrophoretic methods have mostly been used to compare results obtained within one experiment on the same gel. Recent efforts to standardize the methods and the development of computer-based pattern analysis methods have made it possible to compare large numbers of patterns generated in the same or different laboratories.
With these computer-based pattern analysis programs, it is possible to build up databases of DNA restriction fragment length polymorphism (RFLP) patterns and perform identity searches of new patterns in these databases. In addition, the software packages can perform more sophisticated similarity calculations and cluster analyses of the patterns in the databases. They are now increasingly being used (3–5, 7, 8) for epidemiological typing, but to our knowledge different software packages have not been compared.
In the present study, we compared two such systems, the GelCompar (Applied Maths, Kortrijk, Belgium; sold as Molecular Analyst Fingerprinting Plus by Bio-Rad Laboratories, Hercules, Calif.) and the BioImage system (BioImage Corporation, Ann Arbor, Mich.), to determine if results generated by the two programs were comparable. These two systems were chosen for the comparison because they use different approaches to the analysis of the banding patterns. Both systems have automatic band-finding options as well as molecular size estimation, similarity calculation, and cluster analysis features.
MATERIALS AND METHODS
Image analysis systems.
The GelCompar program (version 3.1) program runs under Microsoft Windows (version 3.1 or higher). The suggested minimum hardware configuration is a personal computer with an Intel 386 processor, 4 megabytes of random-access memory, 10 megabytes of free space on the hard disk, and a color monitor and a videocard with a minimum resolution of 256 colors. The program runs very slowly with this configuration, and in the present study it was installed on a Hewlett-Packard Vectra computer equipped with a 66-MHz 486 processor and 12 megabytes of random-access memory. A complete system, consisting of the software program, an image acquisition camera, UV and white light sources, a computer, and a printer, is available from Bio-Rad under the names Gel Doc 1000 and Molecular Analyst Fingerprinting Plus.
The BioImage system (version 3.2) runs on a Sun microcomputer equipped with a UNIX operating system. This system can be purchased complete with the software, an image acquisition camera, UV and white light sources, a computer, and a printer.
Experimental setup.
A mixture of adenovirus type 2 BamHI-EcoRI DNA fragments and StuI fragments of phage λ was used as a reference standard and molecular size marker throughout the study. The mixture was prepared as follows. An adenovirus DNA fragment suspension was made by mixing 2.6 μl of adenovirus BamHI-EcoRI fragments (ca. 0.9 μg; IBI, New Haven, Conn.) with 22 μl of gel loading buffer (50% sucrose and 0.25% bromophenol blue in 100 mM Tris–90 mM boric acid–1 mM EDTA, pH 8.0 [TBE]) and 75.4 μl of 10 mM Tris–1 mM EDTA buffer, pH 8.0. The phage λ fragment stock solution was prepared by digesting 1 μg of phage λ (Boehringer Mannheim, Indianapolis, Ind.) with StuI (Gibco/BRL, Gaithersburg, Md.) in a 20-μl volume in accordance with the instructions of the manufacturer of the restriction enzyme before adding 22 μl of gel loading buffer and 58 μl of 10 mM Tris–1 mM EDTA buffer, pH 8.0. The final adenovirus-λ fragment solution was prepared by mixing 70 μl of the adenovirus DNA with 56 μl of the StuI λ fragment solution. Eighteen microliters of this mixture was loaded onto the gel in each standard lane.
In the first part of the study, EcoRI (Gibco/BRL) and StyI (Gibco/BRL) digests of phage λ were placed between the standards in two gels (gels I and II) to test the accuracy of the molecular size determinations of the systems. The solutions of these λ fragments were prepared by digesting 2-μg amounts of phage λ DNA separately with each enzyme, in accordance with the manufacturer’s instructions, in a volume of 40 μl. After digestion, 44 μl of gel loading buffer and 116 μl of 10 mM Tris–1 mM EDTA buffer, pH 8.0, were added to each digestion tube. Ten microliters of each solution was applied to each lane. In one of the gels (gel II), a heavy load of HindIII (Gibco/BRL) phage λ fragments was placed in two lanes instead of the EcoRI fragments to create distortion of the migration of the fragments in the adjacent tracks in the gel. This solution was threefold more concentrated than the solutions with the EcoRI and StyI fragments. In the second part of the study, the abilities of the software programs to analyze the more complex restriction patterns of DNA from 18 isolates of Listeria monocytogenes were determined (Table 1). For this REA typing procedure, genomic DNA was purified by the method of Graves and Swaminathan (2). It was digested with HhaI (Gibco/BRL) in accordance with the manufacturer’s instructions and electrophoresed in two gels (gels III and IV) with a reference standard in every third or fourth lane. The DNA of five isolates was run in both gels.
TABLE 1.
Isolate no. | Epidemiological origin | Serotype | REA pattern |
---|---|---|---|
1 | Sporadic | 1/2b | F |
2 | Outbreak 1 | 1/2b | A |
3 | Outbreak 1 | 1/2b | A |
4 | Outbreak 2 | 1/2b | G |
5 | Outbreak 2 | 1/2b | G |
6 | Outbreak 2 | 1/2b | G |
7 | Outbreak 2 | 1/2b | G |
8 | Outbreak 2 | 1/2b | G |
9 | Outbreak 2 | 1/2b | G |
10 | Outbreak 2 | 1/2b | G |
11 | Outbreak 2 | 1/2b | G |
12 | Outbreak 2 | 1/2b | G |
13 | Sporadic | 1/2b | B |
14 | Outbreak 2? | 1/2b | G |
15 | Cluster 1a | 1/2a | D |
16 | Cluster 1 | 1/2a | C |
17 | Cluster 1 | 1/2a | E |
18 | Cluster 1 | 1/2a | E |
Suspected to be part of an outbreak based on demographic epidemiological data.
Electrophoresis was done in 0.65% SeaKem agarose (FMC BioProducts, Rockland, Maine) or ultrapure agarose (Gibco/BRL) in 1× TBE buffer, pH 8.0, in a Horizon 20.25 electrophoresis chamber (Gibco/BRL) at approximately 2 V/cm overnight in TBE at ambient temperature. In the experiment with L. monocytogenes DNA, the electrophoresis was done with circulation of the electrophoresis buffer. After electrophoresis, the gels were stained with ethidium bromide (10 mg/liter) and photographed on a UV table, using a Polaroid camera and Polaroid negative black and white film 667. For computer analysis, all images were scanned on a light table, using a BioImage camera and an earlier version of BioImage software (3.0). In contrast to BioImage version 3.2, the older version generates uncompressed TIFF files that may be read by the GelCompar software. The images in TIFF format were transferred to an MS-DOS-formatted diskette for analysis by the GelCompar program. A resolution of 1,024 by 1,024 pixels was used for image acquisition. The normalization settings in GelCompar were as follows: a resolution of 400 points, a smoothing factor of 3 (each data point was averaged with one point on each side), and background subtraction by the rolling disc method with a setting of 12 as recommended by the manufacturer of RFLP gels. Bands were identified by the band search features of both systems. The sensitivity of the band search feature was set so that all bands present were identified by both systems. With these settings, some artifacts on the images were identified as bands; these were manually deleted. In the pattern recognition part of the study, only bands in the size range between 3.5 and 14.3 kb were analyzed. No strain’s DNA contained a band larger than 8.4 kb. The positions of bands smaller than 3.5 kb could not be reliably ascertained because the bands in this region were not completely resolved. All molecular size marker bands in the reference lanes were included in the calculation of the sizes of the bands in the test lanes. This was done to determine the sizes of the bands in the test lanes by interpolation from the reference lane data. The “robust” method in BioImage and the “spline fitting” method in GelCompar were used for interpolation in the molecular size determination procedure. Neither of these methods produces reliable results if the sizes of bands in the test lanes are outside the range of the bands in the reference lanes; i.e., the methods do not work well with extrapolation. For pattern recognition, the optimization feature, a track-to-track alignment feature that recognizes small global shifts (up to 4% migration differences) in similar normalized patterns that are not perfectly aligned, was enabled in GelCompar. A similar feature is not present in BioImage. Similarities between patterns were determined by generating dendrograms. The Dice similarity coefficient (similarity coefficient 2 in BioImage) (1) was used, and the patterns were clustered by the unweighted pair group method using arithmetic averages (UPGMA). The influence of using different numbers of lanes containing molecular size markers on a gel was ascertained by comparing the gels on which all marker lanes were used with the same gels on which the two outermost marker lanes only were used.
RESULTS
A gel with reference molecular size standards as well as EcoRI, StyI, and HindIII phage λ fragments is shown in Fig. 1. The average molecular sizes (both the observed values and their percentages of the expected values) and the associated standard deviations (both the determined values and their percentages of the observed average molecular sizes) for the StyI and EcoRI λ fragments are shown in Table 2. The two programs estimated the molecular sizes with nearly the same precision. The average size calculated for each fragment was within 2% of the actual size of the fragment, except for the largest StyI fragment, which by BioImage was estimated to have a size 2.9% smaller than expected. The standard deviations for fragments of less than 7,743 bp were less than 1% of the calculated average sizes, except for the two smallest StyI fragments, which by BioImage were 1.1 and 2.5%, respectively, of the calculated average molecular sizes. By both systems, the standard deviations for the two largest fragments were between 1.7 and 3.2% of their calculated average sizes. Because of this, the position tolerance (the maximum positional deviation between two identical fragments run in different lanes) was set to 0.5% with an increase of 1.75% in GelCompar, corresponding to a 3% deviation in molecular weight throughout the gel for the pattern comparison portion of the study. Likewise, the deviation in BioImage was set at 3% throughout the gel.
TABLE 2.
Actual fragment size (bp) | BioImage data
|
GelCompar data
|
Fragment generated by: | ||
---|---|---|---|---|---|
Average size observedb | SDc | Average size observedb | SDc | ||
21,226 | 20,968 (98.8) | 530 (2.53) | 20,928 (98.6) | 356 (1.70) | EcoRI |
19,329 | 18,771 (97.1) | 597 (3.18) | 19,378 (100.3) | 519 (2.68) | StyI |
7,743 | 7,655 (98.9) | 43 (0.56) | 7,707 (99.5) | 46 (0.60) | StyI |
7,421 | 7,375 (99.4) | 26 (0.36) | 7,414 (99.9) | 34 (0.46) | EcoRI |
6,223 | 6,138 (98.6) | 41 (0.67) | 6,142 (98.7) | 19 (0.31) | StyI |
5,804 | 5,763 (99.3) | 28 (0.48) | 5,767 (99.4) | 20 (0.34) | EcoRI |
5,643 | 5,561 (98.6) | 19 (0.34) | 5,585 (99.0) | 22 (0.39) | EcoRI |
4,878 | 4,788 (98.2) | 16 (0.33) | 4,805 (98.5) | 18 (0.38) | EcoRI |
4,254 | 4,177 (98.2) | 14 (0.34) | 4,195 (98.6) | 21 (0.49) | StyI |
3,530 | 3,482 (98.6) | 14 (0.40) | 3,536 (100.2) | 16 (0.44) | EcoRI |
3,472 | 3,404 (98.0) | 37 (1.09) | 3,435 (98.9) | 16 (0.45) | StyI |
2,690 | 2,714 (100.9) | 68 (2.50) | 2,672 (99.3) | 8 (0.30) | StyI |
The EcoRI fragments were run 11 times, and the StyI fragments were run 13 times.
The numbers in parentheses denote the observed sizes as percentages of the actual sizes of the fragments.
The numbers in parentheses denote the standard deviations as percentages of the average observed sizes.
The results of the pattern recognition study are shown in Fig. 2 to 5. Figure 2 shows HhaI-digested genomic DNA of strains of L. monocytogenes. The effect of circulation of the electrophoresis buffer can be judged by comparing this figure with Fig. 1. There is no “smiling” effect in the gel run with circulation of the buffer (Fig. 2), while there is a pronounced smiling in the gel run without buffer circulation (Fig. 1). The molecular size standards also show better separation in Fig. 2 than in Fig. 1. In Fig. 3, the clustering of the RFLP patterns of the bacterial isolates, the λ fragments, and the molecular size marker fragments by GelCompar is shown. All marker lanes were used for normalization in this experiment. One of the marker lanes in one of the gels with the clinical-strain DNA was used as a reference standard in the normalization procedure. All patterns were correctly identified in this figure. The corresponding dendrogram generated by BioImage for the same patterns is shown in Fig. 4. The clustering of patterns by BioImage was less satisfactory than that obtained with GelCompar. The BioImage software matched the λ-fragment patterns with the 3% deviation window setting. When rerun on gel IV, isolates 6 and 2 were not clustered with 100% similarity to their run on gel III. The patterns of isolates 15 and 16 were falsely judged to be identical; they differ in the position of a single band in the region between 3.6 and 4 kb (Fig. 3). However, the overall clustering was similar in BioImage and GelCompar. Figure 5 shows the clustering of the RFLP patterns of all isolates studied, all λ fragments, and the molecular weight marker fragments by GelCompar after normalization had been performed with only the outermost molecular size marker lanes. Compared to Fig. 3, for which normalization was optimal, the dendrogram in Fig. 5 is much more branched. This effect is seen only in the patterns in the gels with the most pronounced smiling effect, i.e., the two gels with the EcoRI, StyI, and HindIII λ fragments. Inclusion of a third molecular weight marker lane in the middle of the smiling gels improved the results substantially but without reaching the level of perfection attained when all marker lanes were included in the calculation (data not shown). Similar results were found when only the two outermost standards in each gel were used for analysis of the data by BioImage (data not shown).
DISCUSSION
RFLP pattern analysis software programs enable investigators to compare large numbers of complex patterns in a short period of time. The programs in this study use different approaches to compensate for differences in run length and smiling of the gels. Before analysis can be performed with GelCompar, the patterns have to be normalized. In this process, a lane containing a standard profile is selected as a reference. This standard should also be present in all other gels. The standards are then compressed or stretched to match the profile in the reference lane as closely as possible. The test profiles in between are normalized by interpolation to the nearest standard lanes. The normalized profiles are saved and used for future analyses; information on the sizes of the bands in the standards is not required. With BioImage, all patterns are compared as molecular sizes based on molecular size markers in each gel. A process like the normalization procedure in GelCompar is not needed with BioImage. Thus, GelCompar compares differences in positions, i.e., run lengths, on the gels rather than differences in molecular sizes, the approach used by BioImage.
Both software programs allow storage of the patterns in one or more databases in a computer so that a new pattern can be compared with existing patterns in the database(s). This is particularly useful for tracking specific subtypes of bacteria in epidemiological studies. The two software programs evaluated in this study have several features (e.g., band finding, lane comparison, etc.) that allow many of the tedious and time-consuming steps in pattern analysis to be automatically performed with one or a few keystrokes. Also, they have powerful built-in data analysis features and can generate reports in several formats. Both programs have lane-matching features that will report the total number of bands as well as the number of matched and unmatched bands. This feature is useful for discriminating between unrelated strains; however, the analyst should visually confirm the band-matching results from the programs. Both programs require the analyst to make critical decisions during the various steps in pattern normalization and in selecting the parameters used for analysis and matching of patterns. The analyst needs to become familiar with various features of each program before he or she can obtain the best results. The GelCompar program has several methods of aligning positions on the reference patterns (identified by the operator) within a gel with each other and with a previously chosen reference standard. In our experience, the automatic association methods often produce incorrect alignments and require operator input to make corrections and to align positions appropriately. The BioImage system also has an automatic alignment feature to align identical fragments of reference standards in different lanes on a gel. This alignment feature may not perform satisfactorily if the rates of migration of fragments in different lanes are significantly different (smiling effect). The program draws a horizontal line through identical bands in the reference lanes. These lines can be adjusted manually by the operator as needed to improve the alignments. These adjustments must be made correctly or molecular sizes will not be determined accurately. The BioImage software pattern matches are determined on the basis of molecular size values. The GelCompar program does not use computed molecular sizes of bands for matching but rather uses normalized positions. The normalization process in GelCompar is quite time-consuming for the newcomer, whereas the molecular size estimation by BioImage is straightforward.
With both of these software programs, the molecular size estimates for the majority of the test fragments were within 98% of their actual sizes. The standard deviations for the molecular weights of the large fragments were larger than those for the smaller fragments. This was to be expected because the bands from the larger fragments were broader than the bands from the smaller fragments. Both software programs choose the most intense point in each band as the position of the band.
Many scientists routinely place molecular weight standards in only the outermost lanes in their gels. Based on the results of the present study, this practice needs to be reevaluated since it is very difficult to avoid smiling or other types of distortion in every gel. More highly branched dendrograms were obtained with both software programs when only outer-lane standards were used. Somewhat less branching was seen when three lanes were used for standards. Smiling may be partly avoided if the buffer is recirculated during electrophoresis. However, it is very difficult to avoid gel distortion caused by small differences in the amount of DNA loaded in each lane. The normalization process may be improved by placing the same strain (“standard” strain) in different positions on each gel. An ideal fragment analysis system generates variable as well as invariant bands. Invariant bands may serve as internal controls and help correct for gel distortions.
In this study, with the deviation percentage used, the BioImage software did not cluster isolates 6 and 2 with 100% similarity when they were run on two different gels. Two isolates, 15 and 16, which differed from each other in the position of a single band in the 3.6- to 4.0-kb range, were falsely judged to be identical. BioImage does some proofreading of the band matching if the bands compared are within the size deviation set by the user. This did not work with the size deviation chosen in this study (3%). Deviations of 2.5, 3.5, and 4% gave similar results (data not shown). If the deviation was set to 2%, BioImage correctly judged the profiles of strains 15 and 16 to be different. However, the profiles of the aforementioned two strains (6 and 2) run on different gels were still misinterpreted as being different when this setting was used (data not shown). These problems were not seen with GelCompar when the optimization feature was enabled. If this feature was disabled, a dendrogram similar to the BioImage dendrogram was obtained (data not shown). The overall relationships of the patterns analyzed were identical for the two programs and similar to the one expected from visual inspection of the gels. In GelCompar, dendrograms with error flags at the branchings (standard deviations with respect to the similarity matrix for each branch) may be produced (Fig. 3 and 5). This feature is not present in BioImage. The two RFLP pattern analysis software programs performed almost equally well in determining molecular weights from standards present in each gel and in recognizing relationships between banding patterns.
The development of computer programs for automated analysis of DNA RFLP patterns has made it possible to perform sophisticated comparisons of a large number of complex patterns. However, the programs are not completely automatic but require the user to make critical decisions that affect the way in which the analysis is done and the final results. Similarly, the results of a very distorted gel cannot be corrected by any computer program. It is extremely important to verify the results of all computerized RFLP pattern analyses. This includes visual comparisons of the number of bands in a lane with the bands found automatically by the software programs. It may be useful to include DNA from at least two identical strains on each gel to verify the ability of the software to recognize identical patterns at the chosen settings. It should be stressed that computer programs may be used as an aid in the analysis of complex banding patterns; they do not provide an undisputably correct analysis.
In conclusion, the two programs evaluated in this study performed well. Of the two programs evaluated, BioImage was the easier to use. The version of BioImage evaluated in this study was written for UNIX-based computers; since this study was undertaken, BioImage Corporation has released a version of its software that is designed for Microsoft Windows (3.1 and later versions). This version is reported to have most of the features of the UNIX-based software. The GelCompar program software is designed for Microsoft Windows (3.1 and later versions) and has some useful features, like optimization of profiles and error flags on the similarity dendrograms, not present in BioImage. GelCompar is not as easy to work with as BioImage, but it performed slightly better than the latter program in the present study.
In the present paper, we have considered a few important features that are common to most image analysis systems. When considering the purchase of such a program, additional features may be important. These include statistical, combining, printing, exporting, and program linkage capabilities. The two image analysis programs we have tested will not fulfill the requirements of all laboratories. These software packages are usually priced at approximately $6,000 (U.S.) for a single-user version. Thus, before you decide on a purchase, insist on a free trial period to test the system thoroughly before you buy it.
REFERENCES
- 1.Dice L R. Measures of the amount of ecologic association between species. Ecology. 1945;26:297–302. [Google Scholar]
- 2.Graves L M, Swaminathan B. Universal bacterial DNA isolation procedure. In: Persing D H, Smith T F, Tenover F C, White T J, editors. Diagnostic molecular microbiology: principles and applications. Washington, D.C: American Society for Microbiology; 1993. pp. 617–621. [Google Scholar]
- 3.Graves L M, Swaminathan B, Reeves M W, Hunter S B, Weaver R E, Plikaytis B D, Schuchat A. Comparison of ribotyping and multilocus enzyme electrophoresis for subtyping of Listeria monocytogenes isolates. J Clin Microbiol. 1994;32:2936–2943. doi: 10.1128/jcm.32.12.2936-2943.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Martin C, Ichou M A, Massicot P, Goudeau A, Quentin R. Genetic diversity of Pseudomonas aeroginosa strains isolated from patients with cystic fibrosis revealed by restriction fragment length polymorphism of the rRNA gene region. J Clin Microbiol. 1995;33:1461–1466. doi: 10.1128/jcm.33.6.1461-1466.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Schmid J, Tay Y P, Wan L, Carr M, Parr D, McKinney W. Evidence for nosocomial transmission of Candida albicans obtained by Ca3 fingerprinting. J Clin Microbiol. 1995;33:1223–1230. doi: 10.1128/jcm.33.5.1223-1230.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Swaminathan B, Matar G M. Molecular typing methods. In: Persing D H, Smith T F, Tenover F C, White T J, editors. Diagnostic molecular microbiology: principles and applications. Washington, D.C: American Society for Microbiology; 1993. pp. 26–50. [Google Scholar]
- 7.van Belkum A, Kluytmans J, van Leeuwen W, Bax R, Quint W, Peters E, Fluit A, Vandenbroucke-Grauls C, van den Brule A, Koeleman H, Melchers W, Meis J, Elaichouni A, Vaneechoutte M, Moonens F, Maes N, Struelens M, Tenover F, Verbrugh H. Multicenter evaluation of arbitrarily primed PCR for typing of Staphylococcus aureus strains. J Clin Microbiol. 1995;33:1537–1547. doi: 10.1128/jcm.33.6.1537-1547.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Yang Z H, Mtoni I, Chonde M, Mwasekaga M, Fuursted K, Askgård D S, Bennedsen J, de Haas P E W, van Soolingen D, van Embden J D A, Andersen Å B. DNA fingerprinting and phenotyping of Mycobacterium tuberculosis isolates from human immunodeficiency virus (HIV)-seropositive and HIV-seronegative patients in Tanzania. J Clin Microbiol. 1995;33:1064–1069. doi: 10.1128/jcm.33.5.1064-1069.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]