Abstract
Microscopic detection of Cryptosporidium parvum oocysts is time-consuming, requires trained analysts, and is frequently subject to significant human errors. Artificial neural networks (ANN) were developed to help identify immunofluorescently labeled C. parvum oocysts. A total of 525 digitized images of immunofluorescently labeled oocysts, fluorescent microspheres, and other miscellaneous nonoocyst images were employed in the training of the ANN. The images were cropped to a 36- by 36-pixel image, and the cropped images were placed into two categories, oocyst and nonoocyst images. The images were converted to grayscale and processed into a histogram of gray color pixel intensity. Commercially available software was used to develop and train the ANN. The networks were optimized by varying the number of training images, number of hidden neurons, and a combination of these two parameters. The network performance was then evaluated using a set of 362 unique testing images which the network had never “seen” before. Under optimized conditions, the correct identification of authentic oocyst images ranged from 81 to 97%, and the correct identification of nonoocyst images ranged from 78 to 82%, depending on the type of fluorescent antibody that was employed. The results indicate that the ANN developed were able to generalize the training images and subsequently discern previously unseen oocyst images efficiently and reproducibly. Thus, ANN can be used to reduce human errors associated with the microscopic detection of Cryptosporidium oocysts.
Cryptosporidium parvum is a coccidian protozoan that is an opportunistic pathogen in humans. The disease symptoms of cryptosporidosis within healthy hosts vary. Tzipori (15) described in a review that the diarrhea-like symptoms are very similar to other diseases, but loose, watery stools that can last from 2 to 18 days are not uncommon. However, in immunologically compromised individuals, cryptosporidosis can result in chronic diarrhea that may be fatal (9, 15). Contamination of drinking water is a serious concern because host infection with C. parvum occurs as a result of the ingestion of viable oocysts (8, 9). The monitoring of this organism in source water and finished water, given the apparent low dosage of infectivity (5) and the ability to persist in a viable state within finished water (8), is very important.
For the routine detection of C. parvum in surface and finished water samples, oocysts are labeled with fluorescent monoclonal antibodies (FA). Fluorescent microscopic examination is then used to characterize the morphology of the labeled oocysts. Interpretation of properly stained oocysts by using FA is critical to the monitoring of C. parvum, but there are some inherent difficulties with utilizing this technique to identify oocysts. Cross-reactivity with non-C. parvum species (11), variable intensity of fluorescence with different commercial kits (6), and human error can contribute to problems with oocyst identification. Additionally, an analyst must scan the entire surface of a well slide by FA microscopy, and this can be a time-consuming process. If several samples are observed, human fatigue may impede proper identification. The Environmental Protection Agency Method 1622 recommends that each analyst have at least 2 years of college lecture and laboratory courses in microbiology (or a related field) and extensive training with the FA technique (17). Hence, it is evident that the presumptive identification and confirmation of C. parvum oocysts is highly dependent on the experience of the analyst, and laboratories may interpret the presence of oocysts from similar samples differently (3).
Neural networks and artificial neural networks (ANN), algorithms which mimic neural network function, are a form of problem solving that possess a functional architecture of interconnected neurons in layers (1, 18). Each neuron receives an input signal (information) from other connected neurons and makes a computation applied to an activation function. If the inputs exceed a set threshold, the neuron is activated, and the active neuron then passes an output signal to other neurons within the network (18).
In traditional computer expert systems, a programmer provides an existing framework of rules for the system to utilize in a decision-making process. However, a neural network develops its own set of rules (and/or probabilities) to find correct solutions as it is trained. Unlike traditional computer expert systems, the programmer does not know (or need to know) what criteria the neural network implements to find a solution. Neural networks are more adept at predicting outcomes with certain types of data that cannot be defined by a specific model or set of “rules” (7). Simpson et al. (12) demonstrated that neural networks were capable of differentiating two species of plankton, with some networks identifying over 90% of these images correctly. Such work indicated that efficient classification was possible with images never presented previously to trained networks.
The development of a computer expert system that can reliably identify C. parvum may provide a means to alleviate the inherent problems associated with FA microscopy. It was the goal of this research to develop an ANN that could reliably identify images of Cryptosporidium oocysts.
MATERIALS AND METHODS
C. parvum samples and image processing.
The C. parvum (positive) samples used in the training and testing of the ANN were obtained from a commercial supplier (Waterborne Inc., New Orleans, La.). Portions of the stock oocyst solution were placed on 25-mm well slides and labeled with a working solution of a commercially available, fluorescein-labeled monoclonal antibody (AquaGlo; Waterborne Inc., New Orleans, La.). Following the manufacturer's protocol, the slides were incubated, washed, and mounted with roughly 10 μl of 1,4 diazabicyclo[2 · 2 · 2]octane-glycerol (2% solution). The slides were stored at 4°C in the dark until later microscopic observation. All slides were observed at 500× total magnification with a BH-2 Olympus microscope. The samples were observed under fluorescent light generated by an attached mercury lamp possessing a UV excitation filter at 490 nm.
Fluorescent images from the microscope were collected by using a charge coupled device (CCD) color digital camera (SPOT CCD; catalog no. SP100; Diagnostic Instruments, Inc., Sterling Heights, Mich.). The software (2.1 SPOT Software; Diagnostics Instruments, Inc., Sterling Heights, Mich.) used to operate the digital camera was run using a personal computer (PC) system. The camera collected the images utilizing the default exposure and color settings, and the images were saved in a TIFF format, 1,315 by 1,035 pixels in size. The images were collected from January 1998 to December 1999. Images of individual oocysts were cropped from the original images with a digital image software program (Adobe PhotoShop 5.5; Adobe Systems Incorporated, San Jose, Calif.). The cropped oocyst images were retained as a 36- by 36-pixel image. The color information was discarded, utilizing PhotoShop 5.5, and each of the cropped images was converted into a grayscale TIFF file.
Using a digital image processing software program (ImageTool 2.0; University of Texas Health Science Center, San Antonio, Tex.), each grayscale image was converted into a histogram measuring pixel color intensity. This conversion generated a text file with 256 entries. Each of these entries represented the sum of pixels in a 36 by 36 image that had a specific gray value. Hence, for each of the histogram categories that ranged from absolute black (0) to absolute white (255), there was a number representing the sum of pixels matching that specific color of gray.
Non-C. parvum (negative) sample images. (i) Organic non-C. parvum objects.
Negative images were collected from an algal sample previously known to cross-react with the AquaGlo FA kit obtained by Henry Stibbs (Waterborne Inc., New Orleans, La.). From 10 to 30 μl of the stock algal sample was placed on well slides for labeling and FA processing. Additional negative images (Texas A&M University Research Center, El Paso, Tex.) were obtained from 18 different surface water samples processed using a protocol for the detection of Giardia and Cryptosporidium in surface water (16). All the negative images were samples processed in the same manner as the oocyst (positive) samples.
(ii) Inorganic, non-C. parvum objects: microspheres.
Images of green fluorescent spheres with a size similar to C. parvum oocysts were used to simulate oocysts. Commercially available fluorescent microspheres (5 μl; Fluoresbrite carboxylate microspheres YG, 6 μm, catalog no. 9003-53-6; Polysciences Inc., Warrington, Pa.) were added to a glass slide. The samples were mounted on a well slide and kept in the dark at 4°C until microscopic observation was conducted. Since no staining or labeling was necessary for these samples, the images were obtained directly as described above.
(iii) Inorganic, non-C. parvum objects: artwork and background.
Two additional types of negative images were generated. The first type were cropped portions of background containing no FA from previously collected digital images of positive and negative samples, while the others were color TIFF files of artwork (Japanese animé). These images were cropped from various portions of the original images. Both of these image types were converted to grayscale and processed using a pixel intensity-based histogram protocol as detailed earlier.
Spiked environmental samples.
Fluorescent images of oocysts within a soil matrix were obtained from an earlier study (10). In that study, replicates of three 10-g (wet weight) samples of sandy loam soil collected from West Texas were spiked with roughly 850 C. parvum oocysts. The soil and oocysts were mixed with sterile distilled water, totally saturating the soil samples, in a 50-ml polypropylene tube. The soil slurries were agitated in a wrist action shaker for 30 min and centrifuged, and the soil pellets were underlaid with a Percoll-sucrose flotation solution to further concentrate the oocysts. The resulting concentrates were further purified with commercially available immunomagnetic beads (G/C Direct; Dynal, Oslo, Norway) using procedures described by the manufacturer. The samples were stained and images were collected with the AquaGlo FA kit in a manner similar to that for the C. parvum positive samples.
Samples prepared with a second commercial FA kit.
A different commercial FA kit for C. parvum oocysts (Crypto/Giardia IF Test; TechLab, Blacksburg, Va.) was also used to prepare additional test images. Replicates of four 20-μl volumes of C. parvum oocysts and algal samples known to cross-react with the AquaGlo FA kit were labeled as per the manufacturer's instructions. Both sample types were examined microscopically, and digital images were captured and processed as previously described. A total of 100 images of oocysts labeled with the commercial FA kit were processed. No images of algal cells were collected, as none of the four samples cross-reacted with the TechLab FA kit.
Network image file generation for testing and training.
Each text file was given a unique number to identify which of the original cropped images it represented. Using a text editor, two sets of files were created, one file set of positive images (C. parvum oocysts), and the second file set including all the negative images (organic non-C. parvum images and the inorganic images). To reduce bias, each of the files had the order of the images randomized using a utility function from the ANN program (BrainMaker Professional; California Scientific Software, Nevada City, Calif.). With a text editor, a group of images were randomly selected for network training, and the remaining images were used for network testing. Hence, the images in both the training and testing sets were of randomly selected images and were mutually exclusive. More importantly, the trained networks would be tested against images never previously presented.
Two types of training and testing files were generated, the initial training and testing files and the final training and testing files. The initial training file consisted of 300 total images (200 positive images and 100 negative organic images). This simplified image file was used to determine which factors during training would influence the performance of the ANN. The initial testing files comprised 200 total images (100 positive and 100 organic negative images). The final training file was designed for performance optimization of the ANN utilizing the network basic designs generated from initial training. The final training file consisted of 525 total images (325 positive images, 148 organic negative images, and 52 inorganic negative images). The final testing files consisted of 200 images (100 positive images, 20 inorganic negative images, and 80 organic negative images).
To determine the influence that commercial FA kits had on the efficiency of ANN identification and how different sample matrixes may affect performance, a testing image set was created possessing 100 images (oocysts labeled with the TechLab antibodies). In addition, a set of 62 testing images was created using oocyst images collected from the spiked soil matrix samples labeled with the AquaGlo antibodies.
Network training and testing.
All ANN were developed using a commercial software program (BrainMaker Professional; California Scientific Software, Nevada City, Calif.). The program utilized a backpropagation algorithm and was run on a PC system. During training, statistics measuring training performance were collected. Statistical information from saved networks was then used to select networks for testing against a testing image set. The underlying assumption was that the networks that correctly identified the most images in training would, in turn, have the best network performance for further testing.
Each network was tested with the appropriate testing file. During this procedure, no adjustments were made to the network. The output of the testing was saved as a text file. Each image within the file had two outputs, Crypto and Negative. Each of the outputs had a value associated with it ranging from 0 to 1. A value close to 1 indicated a correct identification while a value of 0 indicated an improper identification. An image was scored as correct or incorrect after comparing the output values to the image's identification number. Each network that was tested would have an overall percentage of correct identifications for the positive and negative images. A correct identification was an output value of 0.900 or higher, while any other result was considered an incorrect identification. Table 1 provides examples of network testing results and how the output values were interpreted.
TABLE 1.
Examples of scoring ANN identifications with testing image data
| True image classification | Testing image output values
|
ANN prediction | Identification score of ANN prediction | |
|---|---|---|---|---|
| Crypto | Negative | |||
| Crypto | 0.999 | 0.002 | Crypto | Correct (prediction over 0.9 threshold for correct output value) |
| Crypto | 0.004 | 0.978 | Negative | Incorrect (prediction does not match true image identification) |
| Crypto | 0.721 | 0.241 | Crypto | Incorrect (prediction not over 0.9 threshold for correct output value) |
| Negative | 0.767 | 0.256 | Crypto | Incorrect (prediction does not match true image identification) |
| Negative | 0.478 | 0.745 | Negative | Incorrect (prediction not over 0.9 threshold for correct output value) |
Using the initial training and testing image sets, a number of variables were evaluated during initial training, including the number of training images, number of hidden neurons, and the combination of these two parameters. Once a preliminary network design was selected, further training was done with the final training image sets to ensure that a proper design was selected. From the final network design experiments, networks were tested against the final image set, and incorrect images were classified by their predicted output values. Using an expanded output value (0.8) as a correct identification threshold, selected networks were retested against all of the testing image sets.
RESULTS
Initial network training experiments.
It was critical to initially determine which design factors would influence the ANN training and testing performance. After training with the initial training image set, selected networks (networks that identified the most training images correctly) were tested with the initial testing image set. The number of testing images correctly identified from select networks was summed and compared with each other using chi-square tests for homogeneity (Table 2). The null hypothesis (H0) was that a specific training parameter (number of hidden neurons, number of training images, or a combination of the two) would not lead to a better testing performance (α = 0.01). The expected values, used for statistical analysis, were the best training performances observed. Experimental results indicated that the number of training images (300 to 2,400 training images with ANN possessing 150 hidden neurons) or the number of hidden neurons (50 to 500 hidden neurons trained with a total of 600 training images) had no effect on testing performance (Table 2). However, certain combinations of these two parameters did lead to significantly different training performances (Table 2). Notably, the lower number of hidden neurons (less than 250) coupled with fewer repetitions of training images (less than 600) resulted in relatively poor testing performances.
TABLE 2.
Initial network training
| Network training parameters | % (no.) correct identificationsa | χ2 (df) | Statistical significance |
|---|---|---|---|
| No. of training images | |||
| 300 | 73 (436) | 1.816 (4) | None |
| 600 | 75 (451) | ||
| 900 | 73 (438) | ||
| 1,200 | 74 (443) | ||
| 2,400 | 72 (432) | ||
| No. of hidden neurons | |||
| 50 | 75 (602) | 6.872 (5) | None |
| 100 | 74 (590) | ||
| 150 | 71 (564) | ||
| 200 | 69 (555) | ||
| 250 | 74 (588) | ||
| 500 | 74 (590) | ||
| No. of hidden neurons (no. of training images) | |||
| 50 (300) | 66 (396) | 30.828 (11) | Difference observed |
| 50 (600) | 77 (460) | ||
| 50 (900) | 71 (427) | ||
| 100 (300) | 67 (403) | ||
| 100 (600) | 75 (450) | ||
| 100 (900) | 71 (423) | ||
| 250 (300) | 78 (466) | ||
| 250 (600) | 75 (448) | ||
| 250 (900) | 75 (447) | ||
| 500 (300) | 74 (445) | ||
| 500 (600) | 74 (444) | ||
| 500 (900) | 75 (451) |
Percentage of correct identifications with the initial testing image set (200 images) from three selected networks or, for number of hidden neurons, with the initial testing image set (200 images) from four selected networks.
Final network design experiments.
From Table 2 it is apparent that the number of images combined with different numbers of hidden neurons did have a significant difference on network testing performance. Network performance was further analyzed by varying the number of hidden neurons using the final training image set (one repetition of the 525 images). Statistical analysis using a chi-square test for homogeneity indicated that networks containing a higher number of neurons resulted in better testing performances. A network design of 750 hidden neurons had the best testing performance (Table 3).
TABLE 3.
Final network training
| Network training parameters | % Correct identifications (no.) | χ2 (df) | Statistical significance |
|---|---|---|---|
| Percentage of training images correctly identified | |||
| Low (48%) | 56 (445)a | 55.836 (2) | Difference observed |
| Midrange (70%) | 69 (555)a | ||
| High (86%) | 79 (633)a | ||
| No. of hidden neurons | |||
| 5 | 17 (100)b | 320.35 (3) | Difference observed |
| 50 | 70 (417)b | ||
| 500 | 80 (477)b | ||
| 750 | 82 (489)b |
Percentage of correct identifications with the final testing image set (200 images) from four selected networks.
Percentage of correct identifications with the final testing image set (200 images) from three selected networks.
To ensure that selecting networks for further testing based on the number of correct identification of training images was a proper condition, networks with various training performance averages (ranging from 48 to 86%) were tested against the final image set. Utilizing a chi-square test for homogeneity, the results indicated that the greater number of correct identifications during training was the proper criterion for selecting networks for further testing (Table 3).
A final network design possessing 256 input, 750 hidden, and 2 output neurons, an adjusted linear learning rate of 0.25 (correct) and 1.5 (incorrect), and a training tolerance of 0.05 was trained with 1,050 total images. Four networks that demonstrated the best training performance were selected for network testing using the final testing image set (Table 4). The first and last training runs were recorded as a comparison of overall network training against the top four networks, but were not selected for further testing (Table 4). The rates of correct identification among the top four networks were not significantly different.
TABLE 4.
Final network training and testing performance
| Trained network run no. | No. (%) of training images correcta | % of positive testing images correctb ± SE | % of negative testing images correctc ± SE |
|---|---|---|---|
| 198 | 980 (93) | 87 (3.4) | 78 (4.1) |
| 212 | 978 (93) | 85 (3.6) | 75 (4.3) |
| 226 | 977 (93) | 84 (3.7) | 80 (4) |
| 234 | 977 (93) | 79 (4.1) | 80 (4) |
| 1 | 532 (51) | ||
| 250 | 954 (91) |
Set of 1,050 training images from the final training set.
Set of 100 positive testing images from the final testing set.
Set of 100 negative testing images from the final testing set.
Most of the incorrect image classifications were the false-positives or false-negatives, as these output values were 0.900 or greater (Fig. 1). The second highest number of images that were misclassified had values less than 0.599, and these images could be considered poor identifications or images that the network could not classify with any degree of certainty (ANN could not identify the images as oocyst or nonoocyst).
FIG. 1.
Output values of misclassified testing images.
By expanding the correct output values to 0.800, each test image was reevaluated to determine if network performance would increase. With this new criterion for classifying the test images (output 0.800 or greater), each network was reevaluated to determine ANN classification performance (Table 5). Network performance was not significantly improved with the positive and negative testing image sets, but there was a slight increase in the number of correct identifications (roughly 2 to 3% increase for the positive image set and 1 to 2% increase for the negative image set). Additionally, any increase of false-positives and/or false-negatives that resulted by expanding the output value threshold would be minimal (an average of fewer than five additional misclassifications for the top four networks) (Fig. 1).
TABLE 5.
Final network testing performances with output value of 0.8
| Network run no. | % of images correctly identified (SE)
|
|||||
|---|---|---|---|---|---|---|
| Positive testing images | Negative testing imagesa | Organic negative imagesb | Inorganic negative imagesc | Positive TechLab imagesa | Soil-spiked oocyst imagesd | |
| 198 | 90 (3) | 78 (4.1) | 82.5 (4.2) | 60 (11) | 97 (1.7) | 93.5 (3.1) |
| 212 | 86 (3.5) | 79 (4.1) | 82.5 (4.2) | 65 (11) | 96 (2) | 93.5 (3.1) |
| 226 | 86 (3.5) | 81 (3.9) | 85 (4) | 65 (11) | 95 (2.1) | 90.3 (3.8) |
| 234 | 81 (3.9) | 82 (3.9) | 86.3 (3.9) | 65 (11) | 95 (2.1) | 87.1 (4.3) |
Set of 100 images.
Set of 80 images.
Set of 20 images.
Set of 62 images.
An interesting observation was the difference in network performance with the inorganic and organic negative images (60 to 65% correct identification compared to 82.5 to 86.3% correct). Looking at the organic negative images alone, they were identified at a very similar level to the Cryptosporidium oocyst images, yet the inorganic negative images resulted in a much poorer network performance (Table 5).
If a single network were to be selected for future testing and implementation into an identification system, network run number 212 would be the likely candidate. This network performed adequately among all of the test image files, and more importantly, the network tested well with the artificial negative images. A second choice would be run number 226. However, all of the networks tested very closely to each other, the poorest performance being run 234 (Table 5).
DISCUSSION
Overall, final network performances were promising and identified positive and negative images with a high degree of success when an output value of 0.8 was implemented. All four of the final networks demonstrated a high degree of correct identification with the testing images. If the networks were simply “guessing” at image identification, it would be expected that network performance would reach roughly 50% (as outputs were either a positive or negative classification). The correct positive and overall negative identifications of each network were well over this percentage (an overall range of 78 to 97%).
The technique employed to process the digital images into a format utilized by the neural networks shows promise. A histogram measuring the pixel intensity of grayscale images is a relatively simple algorithm to program. Because of this, it would not be difficult to design processing techniques for images that were at different resolutions (magnifications). Since the histogram processing is also based on the entire image, as long as the entire image is within the field, the histogram produced will not vary due to differences in orientation, and this would simplify the image acquisition process.
The use of more complex image processing techniques, such as a fast Fourier transformation (FFT), has been implemented in other ANN designs. FFT processing was applied in a study where ANN were designed to discern between two species of marine plankton. In this study, trained networks were able to classify over 90% of the previously unseen plankton images correctly (12). Another study by Culverhouse et al. (4) attempted to create an ANN that could classify five species of marine plankton employing FFT processing. The trained networks were tested against 98 unique, unseen images, with some of the species being classified at 100% correct identification, but the three best networks identified, as a range, only 60 to 77% of all five species correctly (4). The results presented here appear to mimic the performances of trained ANN developed in other studies, despite the differences in network design and image-processing techniques employed.
Another significant difference in this study compared to similar studies is the number of hidden neurons used in the ANN design. It would be expected that a design with fewer hidden neurons would result in networks capable of correctly classifying images to a higher degree than those with a larger number of hidden neurons. However, the data presented in Table 3 contradict this notion, as the networks with fewer neurons (5 and 50) did not test as well as those with higher numbers of hidden neurons (500 and 750). The testing itself was very rigorous, with 362 unique images being presented to the final networks. This number is over half of the images used in training the networks, as the final training image set contained 525 images (replicated once, for a total of 1,050 images). With this large number of unique images, the final networks had correct identifications ranging from 78 to 97% against the testing sets, with only a small subset of the negative images (inorganic negative) having poor identification percentages (Table 5).
Larger numbers of hidden neurons demonstrated better training performances than those with fewer neurons (Table 3). The histogram image-processing technique employed in this study may provide an explanation. The image data for the networks resulted in a large amount of input neurons (a total of 256). Additionally, not every input neuron had a discernable, quantitative value associated with it. Many inputs had numerical values of 0, and during training these inputs may have been treated effectively by the ANN as data noise. Other studies have attempted classification (or identification) of microorganisms through a variety of methods and provided successful results, but such networks had (as a range) only 11 to 20 input neurons (2, 4, 12, 13, 14, 19), which is considerably smaller than the 256 input neurons utilized in this study.
Compared to the organic negative images, the inorganic images had a higher rate of incorrect responses. This may be due to the lack of images presented in training. Only 52 inorganic images were used in the final training set as opposed to the 217 organic negative images. Also, the testing image set was very small, since only 20 inorganic negative images were used, compared to the 80 organic negative testing images. Including additional testing images may help in providing a more accurate assessment of network performance. The higher standard error (roughly 11%) is partly due to the small number of images, since the standard error is based on a Bernoulli binomial. The significance of the greater difficulty in correctly identifying artificial images may not be a concern, because images collected from environmental samples are different from the artificial images used for testing ANN performance and could be easily distinguished by an analyst.
The robustness of ANN in identifying C. parvum oocysts is further supported by the consistency of identification. The 429 AquaGlo FA-labeled images were collected over a period of 10 months. Slight differences in the images due to variations in sample preparation and staining were observed. Despite variations of the oocyst images and extended periods between sample preparations, the networks demonstrated a high degree of success in identifying oocysts (81 to 90%). It is also important to note that none of the testing images were ever presented to the networks during training. The ANN correctly identified, as a range, 87.1 to 93.5% of the environmental oocyst images and 95 to 97% of the oocysts labeled with the TechLab FA kit (Table 5). The similarity in results from the AquaGlo and the TechLab FA kits indicates the robustness of trained ANN in correctly identifying oocysts originating from a variety of samples and labeled with different commercial antibodies. Further indications of such robust performances are provided by the results from the 80 organic negative images (cross-reacting, nonoocysts, and organic debris similar in size and shape to oocysts), where 82.5 to 86.3% of these images were correctly identified (Table 5).
It is apparent that some variance in ANN performance did occur when different FA kits were used. The positive images of oocyst samples labeled with the AquaGlo FA (100 images) resulted in ANN performances of 81 to 90% correct identification (Table 5), whereas the same networks identified up to 97% of the oocysts correctly when labeled with a different commercial FA kit (Crypto/Giardia IF Test; TechLab, Blacksburg, Va.) (Table 5). Additionally, the TechLab antibodies possessed greater specificity for C. parvum oocysts, since none of the algal samples were labeled, whereas the same algal sample demonstrated a high degree of affinity with the AquaGlo FA kit. Hoffman et al. (6) also observed variations within lots and nonspecific fluorescence with the same commercial FA kits employed here. Despite differences in staining characteristics between the FA kits, the trained ANN still produced a high degree of correct presumptive identifications.
The classification of misidentified images (Fig. 1) led to rather interesting results. For all of the networks, roughly half of the misidentified images had output values of 0.900 or greater, such as the second example in Table 1. The remaining output values of the images were scattered between 0.600 and 0.899, with a substantial number of images possessing output values of 0.599 or less (Fig. 1). Such trends indicate that if a network were to misclassify an image, it would most likely be classified as a false-positive (classify a negative image as an oocyst) as or false-negative (classify an oocyst as a negative image). There were also images that were unidentifiable by the network, as a large number of output values for the misclassified images were less than 0.599.
The ANN outputs do not produce a single result (yes/no) but a range of numbers associated with two possible classifications. Because of this, a user would have additional information that could help in identifying misclassified images. If an image resulted in two outputs of 0.500 (an image not identified as either an oocyst or nonoocyst), this would be an indication that the user should attempt to find other means to identify the image, such as further inspection of the sample by a human analyst.
The ANN developed in this study would need some modification before this system could be readily integrated into an industrial setting. Automation of digital image manipulation should be done to simplify the process of creating images for ANN analysis. Further refinement and testing of network performance should also be conducted. Environmental images of oocysts and organic debris or artifacts that were processed with other commercial FA would help in clarifying network performance. More training is also required if the complete identification of C. parvum oocysts is to be done using neural networks. Currently, only the presumptive identification of this organism has been attempted. The observation of internal sporozoites by differential interference contrast optics and the inclusion of the stain 4′,6′-diamidino-2-phenylindole (DAPI) within the oocyst are methods used for the confirmation of C. parvum (17). Such a network may dramatically improve the interpretation of oocysts when both FA- and DAPI-labeled images are employed. A questionable FA-labeled image may be more discernable when the neural network attempts identification by using DAPI staining. However, it is unlikely that a single neural network will be used for both presumptive identification and confirmation of oocysts. Rather, two separate networks will have to be designed, each capable of identifying images by different criteria.
The application of ANN for identifying microscopic Cryptosporidium images has several possible uses. One such use is to offer an analyst a second opinion. A questionable image could be passed to the trained neural network, providing the user a second classification of the image. This information would help the human analyst in determining if a particular sample was a C. parvum oocyst. An ideal application of this technology would be the implementation of trained ANN as the primary means of identifying C. parvum oocysts through automated procedures. A user could prepare a sample slide and have a computer-controlled stage and digital camera capture any suspect images. The ANN could be accessed through the Internet by passing cropped images (processed or unprocessed) to a single remote system. Since the neural network could handle the initial processing (and classification) of images remotely, a single primary analyst could then be used to confirm or clarify identifications made by the ANN at the same location. This would result in more uniform sample analysis. Also, the absolute requirement for an on-site trained person as the primary analyst would be removed. This aspect of remote analysis may be particularly advantageous in developing countries. Since image identification could be handled remotely, only the sampling, labeling, and collection of images would be completed on site and not require the user to have great deal of technical expertise or experience with the identification of C. parvum oocysts.
The development of an ANN to identify Cryptosporidium oocysts has broader applications. Other protozoa, such as Cyclospora and Microsporidium, could also be identified by using this technology. It is possible that several networks could be trained to identify protozoa and other parasitic organisms. The advantage of such systems is that the networks could conduct the routine monitoring of such organisms with little direction from highly trained human analysts.
Utilizing histograms of grayscale pixel intensity for image processing, the data presented represent one of the first studies to date that examine the feasibility of using neural networks to identify C. parvum oocysts. We recognize that further rigorous testing would be needed before ANN could be introduced as an effective alternative to human analysis. However, the demonstration that ANN is able to distinguish unique images of Cryptosporidium oocysts from nonoocysts indicates the potential application of this technology for routine monitoring of C. parvum in the water industry.
Acknowledgments
This study was supported by the Texas A&M University Research and Enhancement Program. Portions of this work were supported by the State of Texas Advanced Technology Program (000517-0361-1999). K.W. was supported by an appointment to the Internship Program at the Office of Ground Water and Drinking Water administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and the U.S. Environmental Protection Agency.
We thank Wyndi McElroy and Ellisette Cabello for assistance in collection of the environmental samples and providing technical expertise with FA microscopy. Roger Hartley's guidance on K.W.'s thesis committee is gratefully acknowledged.
REFERENCES
- 1.Basheer, I. A., and M. Hajmeer. 2000. Artificial neural networks: fundamentals, computing, design, and application. J. Microbiol. Methods 43:3-31. [DOI] [PubMed] [Google Scholar]
- 2.Carson, C. A., J. M. Keller, K. K. McAdoo, D. Wang, B. Higgins, C. W. Bailey, J. G. Thorne, B. J. Payne, M. Skala, and A. W. Hahn. 1995. Escherichia coli O157:H7 restriction pattern recognition by artificial neural network. J. Clin. Microbiol. 33:2894-2898. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Clancy, J., W. Gollnitz, and Z. Tabib. 1994. Commercial labs: how accurate are they? J. Am. Water Works Assoc. 86:89-96. [Google Scholar]
- 4.Culverhouse, P., R. Ellis, R. Simpson, R. Williams, R. Pierce, and J. Turner. 1994. Automatic categorization of five species of Cymatocylis (Protozoa, Tintinnida) by artificial neural network. Mar. Ecol. Prog. Ser. 107:273-280. [Google Scholar]
- 5.Haas, C. N., and J. B. Rose. 1995. Developing an action level for Cryptosporidium. J. Am. Water Works Assoc. 87:81-84.11540484
- 6.Hoffman, R. 1999. Evaluation of four commercial antibodies. J. Am. Water Works Assoc. 91:69-78.
- 7.Lawrence, J. 1994. Introduction to neural networks. California Scientific Software Press, Nevada City, Calif.
- 8.LeChevallier, M. W., and W. D. Norton. 1995. Giardia and Cryptosporidium in raw and finished water. J. Am. Water Works Assoc. 87:54-68.
- 9.MacKenzie, W., N. Hoxie, M. Proctor, M. S. Gradus, K. Blair, D. Peterson, J. Kazmierczak, D. Addiss, K. Fox, J. Rose, and J. Davis. 1994. A massive outbreak in Milwaukee of Cryptosporidium infection transmitted throughout the public water supply. N. Engl. J. Med. 331:161-167. [DOI] [PubMed] [Google Scholar]
- 10.McElroy, W., E. Cabello, and S. D. Pillai. 2001. Efficiency of an immuno-magnetic separation system for recovering Cryptosporidium oocysts from differently textured soils. J. Rapid Methods Autom. Microbiol. 9:63-70. [Google Scholar]
- 11.Rodgers, M., D. Flanigan, and W. Jakubowski. 1995. Identification of algae which interfere with the detection of Giardia cysts and Cryptosporidium oocysts and a method for alleviating this interference. Appl. Environ. Microbiol. 61:3759-3763. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Simpson, R., P. Culverhouse, R. Ellis, and B. Williams. 1991. Classification of Euceratium Gran. in neural networks, p. 223-229. In Proceedings of the IEEE Conference on Neural Networks for Ocean Engineering, August 1991, Washington, D.C. Institute of Electronics and Electrical Engineers, Piscataway, N.J.
- 13.Simpson, R., R. Williams, R. Ellis, and P. Culverhouse. 1992. Biological pattern recognition by neural networks. Mar. Ecol. Prog. Ser. 79:303-308. [Google Scholar]
- 14.Simpson, R., P. F. Culverhouse, R. Williams, and R. Ellis. 1993. Classification of Dinophyceae by artificial neural networks, p. 183-190. In T. J. Smayda and Y. Shimizu (ed.), Toxic phytoplankton blooms in the sea, Elsevier Science Publishers, New York, N.Y.
- 15.Tzipori, S. 1985. Cryptosporidium: notes on epidemiology and pathogenesis. Parasitol. Today 1:159-165. [DOI] [PubMed] [Google Scholar]
- 16.U.S. Environmental Protection Agency. 1995. EPA information collection rule microbial laboratory manual. Publication EPA/814/B-95/001. Office of Ground Water and Drinking Water, Washington, D.C.
- 17.U.S. Environmental Protection Agency. 1999. Method 1622: Cryptosporidium in water by filtration/IMS/FA. Publication EPA/821/R-99/006. Office of Water, Washington, D.C.
- 18.Widrow, B. 1990. 30 years of adaptive neural networks: perception, madaline, and backpropagation. Proc. IEEE 78:1415-1441. [Google Scholar]
- 19.Wilkins, M. F., L. Boddy, C. F. Morris, and R. R. Jonker. 1999. Identification of phytoplankton by using radial basis function neural networks. Appl. Environ. Microbiol. 65:4404-4410. [DOI] [PMC free article] [PubMed] [Google Scholar]

