Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Oct 18.
Published in final edited form as: J Electron Imaging. 2008 Nov 12;17(4):043008. doi: 10.1117/1.3013459

Development and Assessment of an Integrated Computer-Aided Detection Scheme for Digital Microscopic Images of Metaphase Chromosomes

Xingwei Wang 1, Bin Zheng 2, Shibo Li 3, John J Mulvihill 3, Hong Liu 1,*
PMCID: PMC3475421  NIHMSID: NIHMS363279  PMID: 23087585

Abstract

The authors developed an integrated computer-aided detection (CAD) scheme for detecting and classifying metaphase chromosomes as well as assessing its performance and robustness. This scheme includes an automatic metaphase-finding module and a karyotyping module and it was applied to a testing database with 200 digital microscopic images. The automatic metaphase-finding module detects analyzable metaphase cells using a feature-based artificial neural network (ANN). The ANN-generated outputs are analyzed by a receiver operating characteristics (ROC) method and an area under the ROC curve is 0.966. Then, the automatic karyotyping module classifies individual chromosomes of this cell into 24 types. In this module, a two-layer decision tree-based classifier with eight ANNs established in its connection nodes was optimized by a genetic algorithm. Chromosomes are first classified into seven groups by the ANN in the first layer. The chromosomes in these groups are then separately classified by seven ANNs into 24 types in the second layer. The classification accuracy is 94.5% in the first layer. Six ANNs achieved the accuracy above 95% and only one had lessened performance (80.6%) in the second layer. The overall classification accuracy is 91.5% as compared to 86.7% in the previous study using two independent datasets randomly acquired from our genetic laboratory. The results demonstrate that our automated scheme achieves high and robust performance in identification and classification of metaphase chromosomes.

Keywords: Metaphase chromosome, Karyotype, Computer-aided detection, Artificial neural network, Receiver operating characteristics

I. Introduction

In each specimen slide of metaphase chromosomes prepared in a genetic laboratory, the higher fraction of metaphase chromosome cells cannot be used to analyze for diagnostic purpose due to stain debris, incomplete cells, and severely overlapped chromosomes. Visual searching for analyzable metaphase chromosome cells and karyotying the identified chromosomes using an optical microscope are two fundamental procedures required for the diagnosis of cancers and genetic diseases in the genetic laboratories, since Tjio and Levan 1 discovered that the number of human chromosomes was 46 in 1956 and the Denver Group classification of chromosomes was established in 1960 2. The purpose of metaphase finding is to delete stain debris, interphase cells, and other un-analyzable cells to preserve and identify analyzable metaphase cells, in which individual chromosomes are not severely overlapped or touched. Karyotyping is a process to orderly arrange the human chromosomes of a single metaphase cell based on the related banding patterns, size, and the centromere position 3. Figure 1 shows two regions of interest (ROI) that depict an un-analyzable metaphase chromosome cell, an analyzable metaphase cell, several interphase cells, and the corresponding karyotype of an analyzable metaphase chromosome cell. The disadvantages of the manual process to identify analyzable metaphase cells and perform karyotyping include that: (1) it is very time-consuming and tedious, (2) it can introduce the large inter-observer variability and potentially affect the diagnostic accuracy and treatment decision.4 Hence, it leads to great research interest to develop and test automatic metaphase finding and karyotyping systems.

Figure 1.

Figure 1

The examples of metaphase chromosome cells and corresponding karyotype images. (a)A region of interest (ROI) depicting an un-analyzable metaphase cell and an interphase cell, (b) a ROI depicting an analyzable cell and three interphase cells, (c) a ROI with segmented analyzable metaphase chromosome cells, (d) the corresponding karyotyping image of the analyzable cell displayed in Fig. (c).

In the last 30 years, great efforts have been made in the developing automated metaphase- finding and karyotyping systems. In an attempt to identify and analyze metaphase chromosome cells, a number of metaphase finding schemes have been reported. 58 The various algorithms including a rule-based approach 9, a knowledge-based chromosome contour searching method 10, a novel recursive algorithm 11, a minimum entropy segmentation method 12, and an artificial neural network (ANN) 13, have been tested and implemented in different automated metaphase-finding schemes. In order to automatically classify metaphase chromosomes, different methods have also been investigated and reported in previous studies, which include ANN 1421, statistical models 5, 8, 2226, knowledge-based expert schemes 2729, a transportation algorithm 30, a homologue-matching algorithm 31, a fuzzy-logic based classifier 32, and other algorithms 11, 33, 34. Among them, statistical models and ANN classifiers are two of the most popular methods employed for the classification of human chromosomes. Previous research reported that ANN approach was more efficient and achieved very comparable performance comparing to the statistical classifiers in identifying and classifying human chromosomes 13, 35. For example, when applying to the same images acquired from Edinburgh Database, an ANN and a maximum likelihood (ML) based classifier achieved accuracy rates of 82.8% and 81.7% respectively.18 Due to these research efforts, several commercialized systems including Magiscan, Cytoscan, AKS-2 systems, and Genetiscan have been used in some genetic laboratories to assist automated metaphase chromosome finding and karyotyping tasks. 36 However, due to the unstable (un-robust) performance of these systems, the operator intervention is often required to correct errors in the clinical practice.

There are several factors restricting the clinical utility of the current automated metaphase chromosome identification and karyotype schemes and systems. First, most current automated metaphase-finding systems use low-resolution images to search for the location of potentially analyzable cells. The results need to be visually confirmed and cannot be directly used by the automated karyotype system. Specifically, some previously reported metaphase finding systems 5, 6, 13 can only alert the cytogeneticist of the location of potentially analyzable metaphase cells. Visual examination is needed by switching to another microscopic objective with high magnification power (i.e., from 10X to 100X) to determine whether the cell is analyzable or not. Second, most automated karyotype schemes use a single classifier to classify 24 different types of chromosomes. Based on machine learning theory, using a single ANN or another machine learning classifier to simultaneously classify 24 chromosomes makes the classifier very complicated and difficult to train. 20 Without a very large training image database and a set of effective (non-redundant) features, these classifiers tend to generate unstable and poorly robust results when applying to new image datasets.

In an attempt to overcome these limitations, we have separately developed an automated algorithm to detect analyzable metaphase cells and a two-layer decision tree-based classifier to identify different chromosome (karyotyping) in our previous studies. 37, 38 This study aims to (1) integrate these two computerized algorithms (modules) into one computer-aided detection (CAD) scheme and (2) test performance and robustness of the scheme. This integrated CAD scheme automatically detects analyzable metaphase cells depicted on the digital microscopic images acquired by an optical microscope with high magnification objective (e.g., 100X) and directly classifies individual chromosomes of the identified analyzable cells. During the automated chromosome classification (karyotyping), a decision-tree based classifier involving eight ANNs is used. The ANN in the first layer of the decision tree is applied to classify chromosomes into seven groups based on the classification standards of Denver Groups. The second layer of the classifier employs seven adaptively optimized ANNs (one in each group) to further classify individual chromosomes into 24 types. The performance and robustness of this integrated CAD scheme was tested using a new image database, which has not be involved during the previous development and optimization of each of these two computer modules. The description of this new integrated scheme and the detailed experimental results are presented in the following sections.

II. Materials and Methods

2.1 Experimental Dataset

In this study, we selected a new image database including 200 digital microscopic images that were originally obtained from peripheral blood and amniotic fluid samples of patients who underwent diagnosis at the genetic laboratory of University of Oklahoma Health Science Center (OUHSC). All testing sample specimens were stained using Giemsa dye mixture as the staining agent and the band levels of these chromosomes are determined to be 400. Each image was captured using a digital camera installed on the Nikon LABOPHOT-2 optical microscope, which is equipped with an oil immersion based objective for magnification of 100X and has a numerical aperture (NA) of 1.45. The pixel size of the digital image is 0.2 µm × 0.2 µm.

Among these 200 images, 100 images were identified containing analyzable metaphase cells and the metaphase chromosome cells in the other 100 images were classified as un-analyzable by the cytogeneticists in our laboratory. In the dataset of 100 analyzable metaphase cells, 6 metaphase cells were previously diagnosed as demonstrating Down’s syndrome and exhibiting polysomy (including three chromosome 21). Three metaphase cells had other genetic diseases, in which one metaphase cell contained only one chromosome 21, and the other two metaphase cells had two X chromosomes and one Y chromosome. In summary, the 100 analyzable metaphase cell dataset contains a total of 4607 chromosomes. Specifically, this dataset comprises 200 chromosomes for each of 21 types (from type #1 to #20, and type #21), 205 chromosomes for type #21, 141 chromosomes for type X, and 61 chromosomes for type Y. The results of experts’ visual classification of the analyzable metaphase chromosome cells and each individual chromosome inside the analyzable cells were also recorded in the “truth” file.

2.2 The first module to identify analyzable metaphase cells

The first module of our CAD scheme includes the following steps to identify analyzable metaphase cells and delete the un-analyzable cells.37 In brief, a median filter is first used to reduce the noise and artifact background in the digital microscopic chromosome images. Second, an adjustable threshold is applied to obtain binary images from the original microscopic images. Third, a component labeling algorithm 39 and a raster scanning method are applied to label and group the detected targets and delete the isolated small areas. Fourth, from each grouped region, the scheme computes and selects six features including (1) the number of labeled regions, (2) the average size of all labeled regions, (3) the standard deviation of region size, (4) the average pixel value of all regions, (5) the standard deviation of pixel values, and (6) the average radial length of all regions. The detailed definitions and computing methods of these six features have been reported elsewhere.37

These six features are then used to build an artificial neural network (ANN) to classify between analyzable and un-analyzable metaphase cells. This ANN uses a simple three-layer feed-forward topology. The input layer includes six neurons that are represented by the six features. The hidden layer involves three neurons and the output layer contains one decision neuron. A standard back-propagation training algorithm is implemented to train the ANN. In order to minimize the risk of over-fitting and keep the robustness of the ANN performance when applying to new testing cases,40 we selected a limited number of training iterations as well as a large ratio between the momentum and learning rate based on our previous experience to achieve optimal results in training and testing the similar multi-feature based ANNs for detecting and classifying between true-positive lesions and the suspicious but actually negative regions depicted on medical images.41 Specifically, we empirically selected that the number of training iterations was 400; the momentum and the learning rate were set at 0.9 and 0.01, respectively for this application. Assuming there are M analyzable cells and N un-analyzable cells involved in each training iteration cycle, the ANN training program computes the mean square error (MSE) as: Δ=1M+Ni=1M(OiT1)2+j=1N(OjT2)2, where Oi and Oj are computed output values for i and j cells, T1 = 0.9 and T2 =0.1 indicating definite analyzable and un-analyzable cells. The ANN was trained to minimize the MSE of the training dataset in our previous study 37.

2.3 Feature computation

In classification of individual chromosomes, we first computed a set of features from each chromosome. Among the computed features, size, the centromere position, and the banding patterns of a chromosome are considered the three most important features in karyotyping. 5, 23 A centromere is a uniquely specialized region in a chromosome characterization, where the chromatids are joined and by which the chromosome is attached to the spindle during cell division. 42 Polarity assignment determines the orientation of a chromosome through the identification of a p-arm (a shorter arm) and a q-arm (a longer arm). 43 It assigns the top of the p-arm to the top of a chromosome. Figure 2 displays an ideogram of chromosome #1 and the corresponding centromere, p-arm and q-arm. Because of the diversity of morphologies produced by the different stages of the cell cycle, slide preparation, and banding characteristics of metaphase chromosomes depicted on clinical images (Figure 2), there is a large variability in shapes and orientations of chromosomes, which substantially reduces the performance and robustness of the computerized schemes for automatic karyotyping. In this integrated scheme we applied a previously developed image processing and classification method to detect centromere and assign corresponding polarity for each individual chromosome.44 In brief, because of the limitation of the conventional skeleton (thinning) algorithm when applying to the varying chromosomes with different morphologies and shapes, our method uses a modified thinning algorithm to detect the medial axis of each segmented chromosome. Figure 2 also shows examples of four chromosome #1 with different bending shapes as well as the medial axes detected by a conventional and our modified thinning algorithm. The computer program then detects and records the perpendicular lines along the medial axis of a chromosome followed by a rule-based classifier to detect centromere and assign corresponding polarity for each individual chromosome.

Figure 2.

Figure 2

Ideograms of chromosome #1, examples of variability of morphologies of chromosome #1, and the medial axis results of chromosome #1. (a) – (d) the original chromosomes, (e) – (h) the chromosomes marked with the medial axis detected by the conventional thinning algorithm, and (i) – (l) the chromosomes marked with the medial axis detected by our modified thinning algorithm.

After detecting medial axis of the chromosome, our scheme computes three profiles (including density, shape, and banding profile).38 Each profile defines a one-dimensional graph of a chromosome property computed at a sequence of points along the identified medial axis of a chromosome. A density profile determines the average grey scale value of every perpendicular line across the medial axis of a chromosome (x). D(x)=[i=1ngi(x)]/n, where gi(x) and n are the gray value of each pixel and the number of all pixels located in a perpendicular line. A shape profile records the weighted width of every perpendicular line across the medial axis of a chromosome (x). S(x)=i=1n[gi(x)×di(x)2]/i=1ndi(x)2, which corresponds to the sum of the product of the grey scale value gi(x) and its corresponding Euclidean distance di(x) away from the medial axis of the perpendicular line, divided by the sum of the distance 23. A banding profile is computed by processing a density profile D(x) with a non-linear transform filter defined by Kramer and Bruckner method 45. In the density profile each band is characterized by a uniform density and the transitions between neighboring bands use step functions.46

Based on these three profiles, a total of 31 features are computed for each chromosome in an identified analyzable metaphase cell. These features form an initial feature pool, which is summarized and listed in Table 1. The detailed definitions and computing methods of these features have been reported in our previous study.38 In brief, the 31 features can be classified into four types. The first type includes two features related with the centromere index (CI). The second type contains the global features that contain chromosome area, chromosome length, and chromosome density. The third type of feature includes 12 local band features. The forth type includes 14 image features that are computed from global band patterns based on weighted density distribution (WDD) functions.46 Figure 3 demonstrates eight WDDs. For example, WDD2 expresses whether the density is mainly distributed in the middle of the profile. Six of eight WDD functions (WDD1 to WDD6) have been tested in a previous study 46. In our previous study 38, we added and tested two new functions WDD7 and WDD8 in this group of features. WDD7 is used to search for a dark band in the p-arm of a chromosome and WDD8 determines if there are three equally spaced dark bands in the q-arm of a chromosome.

Table 1.

Distribution of 31 computed chromosome features in karyotyping subsystem

Feature type Number of
features
Brief feature description
1. Centromere index 2 Area of CI and length of CI.
2. Pixel distribution 3 Chromosome size, length, and the average density.
3. Local band patterns 12 Band distributions including the number, location, size of specific dark or white bands
4. Global band patterns 14 Band patterns computed from 8 WDD functions and 6 differences (the first order derivatives) of WDD functions.

Figure 3.

Figure 3

Illustration of the weighted density distribution functions utilized in feature computation.

2.4 The second module to classify metaphase chromosomes

To reduce the training difficulty and the complexity of a classifier to simultaneously classify 24 types of chromosomes, a two-layer decision-tree based classifier is utilized in the second module of our CAD scheme. The classifier includes eight artificial neural networks (ANN) established in its decision nodes as shown in Figure 4. All chromosomes of a single analyzable metaphase cell detected by the first stage of our CAD scheme are processed and classified by an ANN in the first layer of the classifier. This ANN has three output neurons and can generate eight different outputs (from “000” to “111”). Based on the different output, the ANN classifies each of the chromosomes in the cell into one of seven groups (A~G) by the characteristics and definition of Denver Group standard 2. For example, if the ANN output is “000”, the chromosome is classified to group A; while the ANN output is “110” the chromosome is assigned to group G (Figure 4). The output “111” is not activated. Any “111” output indicates that the chromosome cannot be correctly classified by this classifier and it is defined and reported as an “undetermined” chromosome. In the second layer, seven ANNs (one for each group) were adaptively optimized to classify individual chromosomes into 24 types. Thus, for each chromosome that has been classified into one of the seven groups (A to G), another corresponding ANN is applied to classify it into one of the specific 24 types of chromosomes.

Figure 4.

Figure 4

A diagram of a two-layer decision tree-based classifier using 8 adaptively optimized ANNs.

The initial feature pool includes 31 computed features. To eliminate the redundant features and select effective features based on the different image characteristics of chromosomes in different groups, all ANNs used in this two-layer decision tree based classifier have been optimized using a genetic algorithm (GA) in our previous study.38 A publicly available GA software known as Genesis developed by John Grefenstette 47 was selected for this study. In GA optimization, the binary coding method is applied to create a chromosome used in GA. To avoid confusion, the GA chromosomes are all presented in Italic format. In this study, there are 35 genes in a GA chromosome, in which the first 31 genes correspond to initially extracted features and the last four genes represent the hidden neurons. For each feature-related gene, “1” indicates that this feature is selected as one input feature of the ANN; while “0” means that this feature is discarded. The mean square error (MSE) between ANN generated testing scores and the pre-recorded truth for all training samples is used as GA fitness criterion to assess ANN classification performance. We also used default initial parameters of GA software in the optimization process. These include that (1) the initial population size of GA chromosomes is set as 100 and (2) the crossover rate, the mutation rate, and the generation gaps are specified as 0.6, 0.001, and 1.0, respectively.47 The effectiveness of this GA optimization protocol has been extensively tested and evaluated in our previous studies.38, 41 Using this protocol, GA searches for the better GA chromosome with smaller MSE until the searching results converges to the “best” GA chromosome. Table 2 summarizes the eight “best” GA chromosomes for eight optimized ANNs. For example, the first ANN that classifies chromosomes into seven groups has 15 input neurons and 6 hidden neurons because in the first 31 genes of the GA chromosome there are 15 “1” and 16 “0” (indicating that 15 features are selected and 16 are discarded) and the last four genes are 0110 representing 6 in the decimal system. The results indicate that due to the different image characteristics of chromosomes in seven groups, different feature sets should be selected for different ANNs to achieve optimal performance.

Table 2.

Feature used in eight ANNs

ANN GA chromosome Input
neurons
Hidden
neurons
1 11100101010101100110001000011100110 15 6
2-1 11101000010000000001001111001111110 13 14
2-2 00010010110100100010111010111001111 14 15
2-3 01001001111010000001000111011010101 14 5
2-4 00000101000101101101111111010111101 17 13
2-5 01110110010100000011011110010011110 15 14
2-6 11110100001101100000010011000111010 14 10
2-7 01100011010000000001101100110001000 11 8

2.5 Assessment of scheme performance and robustness

In this study, we designed a computer interface to link two modules together and built an integrated CAD scheme. Two detection and classification modules were previously optimized using different image databases. To assess its performance and robustness, this integrated CAD scheme was applied “as is” to the new database collected for this study. We tested and analyzed the performance of each module as well as the integrated CAD scheme. In assessment of the first module, we used a receiver operating characteristic (ROC) method 48 to evaluate the performance. The ANN-generated classification scores of all analyzable and un-analyzable samples in this testing dataset were converted into two histograms with 11 bins each 37. Based on these two histograms, we plotted an un-smoothed ROC type performance curve and applied ROCFIT program 49 that uses maximum likelihood estimation method 50 to compute an area under the ROC curve (AZ value) that is used as an index to assess the module performance. To evaluate the performance of the second module and the integrated CAD scheme, we counted and compared the percentage of chromosomes that were assigned correctly to the seven groups and 24 types of chromosomes visually classified by the cytogeneticists.

III. Results

The performance of the first CAD module reaches AZ = 0.966 ± 0.003 in detecting and classifying between 100 analyzable and 100 un-analyzable metaphase chromosome cells. Figure 6 shows a ROC-type performance curve of the testing result of the first module in the CAD scheme applied to this image dataset. According to this performance curve, our scheme correctly detects and classifies all 100 analyzable cells with the maximum five false-positive detections (namely, classifying five un-analyzable cells as analyzable cells). This high performance indicates the large interclass discrimination and insensitivity to extraneous variables (little signal-to-noise ratio dependency) of the selected features. For example, Figure 7 displays a scatter diagram between the average pixel values and the number of labeled regions in each identified metaphase cell. For these two features, the analyzable cells are mostly located in the upright corner of the diagram, which indicates that the analyzable cells typically have larger numbers of labeled regions and higher pixel values than un-analyzable cells. Thus, one of the most important reasons for the metaphase cells being classified as un-analyzable is that these cells have a higher percentage of overlapped chromosomes resulting in a smaller number of separated or independent regions labeled by the computer scheme.

Figure 6.

Figure 6

A ROC-type performance curve generated by an ANN in the dataset.

Figure 7.

Figure 7

A scatter diagram between two features of 200 testing samples including 100 analyzable and 100 un-analyzable cells.

The second stage (karyotyping module) of our CAD scheme uses eight ANNs in the two-layer decision tree classifier to classify individual chromosomes inside the identified analyzable metaphase cells. Table 3 summarizes the detailed classification results of each ANN. As shown in Table 3 94.5% (4354 out of 4607) of chromosomes in our testing dataset are correctly classified by the ANN located in the first layer into one of the seven groups (A to G). Then, separately applying seven adaptively optimized ANNs located in the second layer to the chromosomes in corresponding groups, the ANN classification accuracies range from 80.6% to 98.8%, in which six ANNs achieve the classification accuracy above 95.2% and only one ANN (group C) has lessened performance with 80.6% accuracy. The overall classification accuracy is 91.5% or the error rate is 8.5% indicating that 4217 out of 4607 chromosomes are correctly classified by our scheme.

Table 3.

Classification results of eight ANNs used in the two-layer decision tree based classifier

Group Type of
chromosomes
Number of
testing
chromosomes
Correctly
classified
chromosomes
in the first layer
Classification
accuracy rate
in the first
layer
Correctly
classified
chromosomes
in the second
layer
Overall
classification
accuracy rate
Seven
groups
A~G 4607 4354 94.5% 4217 91.5%
A 1–3 600 587 97.8% 585 97.5%
B 4–5 400 387 96.8% 381 95.2%
C 6–12, X 1541 1356 88.0% 1242 80.6%
D 13–15 600 582 97.0% 574 95.7%
E 16–18 600 594 99.0% 592 98.7%
F 19–20 400 395 98.8% 395 98.8%
G 21–22, Y 466 453 97.2% 448 96.2%

IV. Discussion And Conclusions

Development of the automated metaphase finding and karyotyping systems has been attracting extensive research interest in the last two decades. It aims to help clinicians more accurately and efficiently detect chromosomal rearrangement, which is a powerful indicator in the diagnosis of cancers and genetic diseases as well as in monitoring the cancer prognosis and treatment efficacy. Although considerable research efforts and substantial progress have been made in developing automated computer schemes for identifying metaphase chromosomes and karyotyping, current available automated systems have a number of limitations. First, many of previous studies selected a large number of features, built very complex classifiers, and used difficult computational methods to simultaneously classify chromosomes into 24 types 51. These previous studies have shown that these sophisticated systems and methods limited their ability to achieve high and robust performance in clinical cytology and pathology applications 52. Second, most of the available systems are actually semi-automated systems that need substantial human intervention during the operation. For example, human interventions are often required to switch the microscopic objectives in detecting analyzable metaphase chromosome cells (the metaphase finding). In our previous studies, we used different approaches to develop a set of simple CAD modules for detecting analyzable metaphase cells and classifying individual chromosomes. In the first module of automatically detecting analyzable metaphase cells, compared with previously reported studies using low-resolution images 5, 6, 13, our scheme directly applies to the high-resolution images 37. As a result, the analyzable metaphase cells detected and prompted by the scheme can be directly examined and analyzed by cytogeneticists for the purpose of diagnosis without using the microscope. The chromosomes involved in the detected analyzable metaphase cells can be directly processed by the automatic karyotyping or other CAD schemes to perform more comprehensive tasks. In addition, due to the use of a simple ANN with six features to detect analyzable metaphase cells, the over-fitting of the scheme can be relatively easy minimized. For the second module (automated karyotyping) of classifying individual chromosomes, our scheme applies a new concept of using an adaptive optimization method to build an automated scheme with a two-layer classification structure and tests the feasibility of using a GA to separately select effective features and optimize the topology of each ANN.

In this study, we integrated the two automated modules into a complete CAD scheme and assembled a new image dataset to assess its performance and robustness. The results of this assessment study show that when applying to this new image dataset, the performance of our CAD scheme maintains a level very comparable to the level from the previous dataset for optimizing the two modules of the scheme 37, 38. We also examined and compared other chromosome classification results reported in a number of previous studies 14, 17, 19 in which a single large ANN was developed to classify 24 types of chromosomes. These studies used three publicly available databases (the Copenhagen, Edinburgh, and Philadelphia databases) and selected different image features to build ANNs. Due to the complexity of the ANN topology (e.g. an ANN with 15 input neurons, 100 hidden neurons, and 24 output neurons 17). The reported classification error rates are approximately 6.2%, 17.8%, and 22.7% for Copenhagen, Edinburgh, and Philadelphia databases, respectively 17. In this study, we used a database collected from our genetic laboratory, and the overall error rate of our classifier based on the combination of two layers of ANNs was 8.5%. Table 4 describes and compares the image characteristics between our dataset and three public datasets. Since the Copenhagen database contains high quality and straight chromosomes in which the locations of centromeres are known through manual identification, the testing performance of ANN classifiers is usually high. Three previous studies reported classification error rates of 6.2%, 8.8%, 10.3%, respectively on Copenhagen database14, 15, 17. Edinburgh and Philadelphia datasets contain more difficult chromosomes resulting in the substantially low performance of the computer schemes. The reported error rates ranged from 15.3% to 22.1% and 22.7% to 28.6% for Edinburgh and Philadelphia databases, respectively.14, 17, 23, 30 Based on the image characteristics summarized in Table 4 we estimated that the difficult level of our dataset is similar to Edinburgh databases. Thus, the 8.5% error rate of our classifier suggests that our scheme achieves a very comparable or improved performance due to the diversity of our database. In addition, although the size of dataset used in this study is doubled as compared to the size of dataset used in our previous study (4607 versus 2300), the overall classification accuracy of 91.5% (or 8.5% error rate) achieved in this study is higher than 86.7% achieved in the previous study.38 The most obvious performance improvement occurs in group C (80.6% versus 67.5%) indicating that new dataset contains less number of difficult chromosomes in this group. For other six groups the performance differences are within ±5%. Therefore, despite of the potential difference in dataset difficulty level, the testing results clearly indicate that the topology design of our scheme using eight simply structured ANNs in a two-layer decision tree and the adaptive ANN optimization approach using genetic algorithm effectively avoids the risk of over-fitting using the limited training dataset. As a result, the performance of our scheme is consistent and robust when it is applied to the new testing dataset.

Table 4.

Summary of different chromosome databases

Database Copenhagen Edinburgh Philadelphia OUHSC
Number of chromosomes 8106 5548 5847 4607
Data quality Good Fair Poor Fair
Including severely bent or touching chromosomes No Yes Yes Yes

In summary, the main purpose of developing computer schemes to automatically detect analyzable metaphase chromosome cells and to classify individual chromosomes is to eliminate or minimize the tedious and labor-intensive manual process, which is routinely used in current genetic laboratories to identify analyzable metaphase chromosome cells and perform karyotyping of the chromosomes. The success of the automated CAD scheme may also help improve (1) diagnostic accuracy by detecting and analyzing more cells and (2) diagnostic consistency by reducing intra- and inter-reader variability. However, since most of the automated schemes developed for this purpose were trained and optimized using machine learning methods, testing the robustness of each scheme using new independent image datasets is important. In this study, we reassessed the performance of our CAD scheme using a new dataset that has never been involved during the training and optimization process of the scheme. The testing results indicate that our CAD scheme achieved higher and robust performance in analyzable metaphase chromosome cell detection and individual chromosome classification. In our future studies, we will continue to improve the performance level of our CAD scheme and assess its potential clinical utility.

Figure 5.

Figure 5

An example of using a GA to train an ANN in a two-layer decision tree based classifier.

Acknowledgement

This research is supported in part by grants from the National Institutes of Health (NIH), CA115320. The authors would like to acknowledge the support of the Charles and Jean Smith Chair endowment fund as well.

References

  • 1.Tjio JH, Levan A. The chromosome number in man. Hereditas. 1956;42:1–6. [Google Scholar]
  • 2.Conference D. A proposed standard system of nomenclature of human mitotic chromosomes. Lancet. 1960;1:1063–1065. [PubMed] [Google Scholar]
  • 3.Richardson AM. Chromosome analysis. In: Barch MJ, Knutsen T, Spurbeck JL, editors. The AGT cytogenetics laboratory manual. Philadelphia: Lippincott-Raven; 1997. pp. 481–526. [Google Scholar]
  • 4.Wang X, Zheng B, Wood M, Li S, Chen W, Liu H. Development and evaluation of automated systems for detection and classification of banded chromosomes: current status and future perspectives. J. Phys. D: Appl. Phys. 2005;38:2536–2542. [Google Scholar]
  • 5.Piper J, Granum E, Rutovitz D, Ruttledge H. Automation of chromosome analysis. Signal Processing. 1980;2:203–221. [Google Scholar]
  • 6.Graham J, Pycock D. Automation of routine clinical chromosome analysis II. metaphase finding. Anal. Quant. Cytol. Histol. 1987;9:391–397. [PubMed] [Google Scholar]
  • 7.vanVliet LJ, Young IT, Mayall BH. The athena semi-automated karyotyping system. Cytometry. 1990:51–58. doi: 10.1002/cyto.990110107. [DOI] [PubMed] [Google Scholar]
  • 8.Carothers A, Piper J. Computer-aided classification of human chromosomes: A review. Statistics and Computing. 1994;4:161–171. [Google Scholar]
  • 9.Liang J. Fully automatic chromosome segmentation. Cytometry. 1994;17:196–208. doi: 10.1002/cyto.990170303. [DOI] [PubMed] [Google Scholar]
  • 10.Agam G, Dinstein I. Geometric separation of partially overlapping nonrigid objects applied to automatic chromosome classification. IEEE Trans Pattern Analysis and Machine Intelligence. 1997;19:1212–1222. [Google Scholar]
  • 11.Popescu M, Gader P, Keller J, Klein C. Automatic karyotyping of metaphase cells with overlapping chromosomes. Computers in Biology and Medicine. 1999;29:61–82. doi: 10.1016/s0010-4825(98)00040-7. [DOI] [PubMed] [Google Scholar]
  • 12.Schwartzkopf W, Evans BL, Bovik AC. Entropy estimation for segmentation of multispectral chromosome images; Proceedings of Fifth IEEE Southwest Symposium on Image Analysis and Interpretation; 2002. pp. 234–237. [Google Scholar]
  • 13.Cosio F, Vega L, Becerra A, Melendez C. Automatic identification of metaphase spreads and nuclei using neural networks. Med Biol Eng Comput. 2001;39:391–396. doi: 10.1007/BF02345296. [DOI] [PubMed] [Google Scholar]
  • 14.Sweeney WP, Musavi MT, Guidi JN. Classification of chromosomes using a probabilistic neural network. Cytometry. 1996;16:17–24. doi: 10.1002/cyto.990160104. [DOI] [PubMed] [Google Scholar]
  • 15.Jennings AM, Graham J. A neural network approach to automatic chromosome classification. Phys. Med. Biol. 1993;38:959–970. doi: 10.1088/0031-9155/38/7/006. [DOI] [PubMed] [Google Scholar]
  • 16.Lerner B, Levinstein M, Rosenberg B, Guterman H. Feature selection and chromosome classification using a multilayer perceptron neural network, Neural Networks; IEEE International Conference on Computational Intelligence; 1994. pp. 3540–3545. [Google Scholar]
  • 17.Errington P, Graham J. Application of artificial neural networks to chromosome classification. Cytometry. 1993;14:627–639. doi: 10.1002/cyto.990140607. [DOI] [PubMed] [Google Scholar]
  • 18.Lerner B. Toward a completely automatic neural-network-based human chromosome analysis. IEEE Trans Systems, Man, and Cybernetics– Part B: Cybernetics. 1998;28:544–552. doi: 10.1109/3477.704293. [DOI] [PubMed] [Google Scholar]
  • 19.Cho J. Chromosome classification using back propagation neural networks. IEEE Engineering in Medicine and Biology Magazine. 2000;19:28–33. doi: 10.1109/51.816241. [DOI] [PubMed] [Google Scholar]
  • 20.Delshadpour S. Reduced size multi layer perceptron neural network for human chromosome classification; Proceedings of the 25th Annual International Conference of the IEEE (Engineering in Medicine and Biology Society); 2003. pp. 2249–2252. [Google Scholar]
  • 21.Wu Q, Suetens P, Oosterlinck A. Chromosome classification using a multi-layer perceptron neural net; Proceedings of the Twelfth Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 1990. pp. 1453–1454. [Google Scholar]
  • 22.Groen F, Kate Tt, Smeulders A, Young I. Human chromosome classification based on local band descriptors. Pattern Recognition Letters. 1989;9:211–222. [Google Scholar]
  • 23.Piper J, Granum E. On fully automatic feature measurement for banded chromosome classification. Cytometry. 1989;10:242–255. doi: 10.1002/cyto.990100303. [DOI] [PubMed] [Google Scholar]
  • 24.Schwartzkopf WC. Electrical Engineering. Austin: The University of Texas at Austin; 2002. Maximum likelihood techniques for joint segmentation-classification of multi-spectral chromosome images. [DOI] [PubMed] [Google Scholar]
  • 25.Wu Q, Castleman KR. IEEE Symposium on Computer-Based Medical Systems CBMS 2000. Houston, TX, USA: 2000. Automated chromosome classification using wavelet-based band pattern descriptors; pp. 189–194. [Google Scholar]
  • 26.Wu Q, Liu Z, Chen T, Xiong Z, Castleman KR. Subspace-based prototyping and classification of chromosome images. IEEE Trans Image Processing. 2005;14:1277–1287. doi: 10.1109/tip.2005.852468. [DOI] [PubMed] [Google Scholar]
  • 27.Wu Q, Suetens P, Oosterlinck A. On knowledge-based improvement of biomedical pattern recognition-a case study; Proceedings of 5th conference on Artificial Intelligence for Applications; 1989. pp. 239–244. [Google Scholar]
  • 28.Lu Y, Ya Y. An expert system for banded chromosomes recognition, in; Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 1989. pp. 1789–1790. [Google Scholar]
  • 29.Ramstein G, Bernadet M, Kangoud A, Barba D. A rule-based image analysis system for chromosome classification; Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 1992. pp. 926–927. [Google Scholar]
  • 30.Tso MKS, Graham J. The transportation algorithm as an aid to chromosome classification. Patt. Recog. Lett. 1983;1:489–496. [Google Scholar]
  • 31.Zimmerman SO, Johnston DA, Arrighi FE, Rupp ME. Automated homologue matching of human G-banded chromosomes. Comput. Biol. Med. 1986;16:223–233. doi: 10.1016/0010-4825(86)90050-8. [DOI] [PubMed] [Google Scholar]
  • 32.Keller JM, Gader P, Sjahputera O, Caldwell CW. A fuzzy logic rule-based system for chromosome recognition; Proceedings of the Eighth IEEE Symposium on Computer-Based Medical Systems; 1995. pp. 125–132. [Google Scholar]
  • 33.Stanley RJ, Keller JM, Gader P, Caldwell CW. Data-driven homologue matching for chromosome identification. IEEE Trans Med Imaging. 1998;17:451–462. doi: 10.1109/42.712134. [DOI] [PubMed] [Google Scholar]
  • 34.Gregor J, Granum E. Finding chromosome centromeres using band pattern information. Compt. Biol. Med. 1991;21:55–67. doi: 10.1016/0010-4825(91)90036-9. [DOI] [PubMed] [Google Scholar]
  • 35.Graham J, Errington P, Jennings AM. A neural network chromosome classifier. J Radiat Res. 1992;33:250–257. doi: 10.1269/jrr.33.supplement_250. [DOI] [PubMed] [Google Scholar]
  • 36.Graham J, Piper J. Automatic karyotype analysis. In: Gosden JR, editor. Methods in Molecular Biology: Chromosome Analysis Protocols. Totowa: Humana Press Inc.; 1994. pp. 141–186. [DOI] [PubMed] [Google Scholar]
  • 37.Wang X, Li S, Liu H, Wood M, Chen W, Zheng B. Automated identification of analyzable metaphase chromosomes depicted on microscopic digital images. J Biomed Informatics. 2008;41:264–271. doi: 10.1016/j.jbi.2007.06.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Wang X, Zheng B, Li S, Mulvihill JJ, Liu H. Automated classification of metaphase chromosomes: optimization of an adaptive computerized scheme. J Biomed Informatics. 2008 doi: 10.1016/j.jbi.2008.05.004. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Mitchell TM. Machine Learning. Boston MA: WCB McGraw-Hill; 1997. [Google Scholar]
  • 40.Hertz J, Krogh A, Palmer RG. Introduction to the theory of neural computation. Redwood City, CA: Addison-Wesley Publishing Company; 1991. [Google Scholar]
  • 41.Zheng B, Chang YH, Good WF, Gur D. Performance gain in computer-assisted detection schemes by averaging scores generated from artificial neural networks with adaptive filtering. Med Phys. 2001;28:2302–2308. doi: 10.1118/1.1412240. [DOI] [PubMed] [Google Scholar]
  • 42.Tseng CC. Human chromosome analysis in tested studies for laboratory teaching. In: Goldman CA, editor. Proceedings of the 16th Workshop/Conference of the Association for Biology Laboratory Education (ABLE) Georgia: Atlanta; 1995. pp. 35–56. [Google Scholar]
  • 43.ISCN. An International System for Human Cytogenetic Nomenclature. Switzerland: S. Karger Publishers; 2005. [Google Scholar]
  • 44.Wang X, Zheng B, Li S, Mulvihill JJ, Liu H. A rule-based scheme for centromere identification and polarity assignment of metaphase chromosomes. Computer Methods and Programs in Biomedicine. 2008;89:33–42. doi: 10.1016/j.cmpb.2007.10.013. [DOI] [PubMed] [Google Scholar]
  • 45.Kramer HP, Bruckner JB. Iterations of a non-linear transformation for enhancement of digital images. Pattern Recognition. 1975;7:53–58. [Google Scholar]
  • 46.Granum E. Electronics Lab. Denmark, Lyngby: Tech. Univ.; 1980. Pattern recognition aspects of chromosome analysis - Computerized and visual interpretation of banded human chromosomes. [Google Scholar]
  • 47.Kantrowitz M. Artificial Intelligence Repository 1, selected materials from the Carnegie Mellon University. Sunnyvale, CA: Prime Time Freeware; 1994. Prime time freeware for AI, issue 1-1. [Google Scholar]
  • 48.Obuchowski NA. ROC analysis. Am J Roentgen. 2005;184:364–372. doi: 10.2214/ajr.184.2.01840364. [DOI] [PubMed] [Google Scholar]
  • 49.Metz CE KH. Kurt Rossmann Laboratories for Radiologic Image Research. Chicago, IL: Department of Radiology, University of Chicago; 1998. Computer programs ROCKIT. [Google Scholar]
  • 50.Metz CE, Herman BA, Shen JH. Maximum-likelihood estimation of receiver operating characteristic (ROC) curves from continuously-distributed data. Stat. Med. 1998;17:1033–1053. doi: 10.1002/(sici)1097-0258(19980515)17:9<1033::aid-sim784>3.0.co;2-z. [DOI] [PubMed] [Google Scholar]
  • 51.Schuleiud H, Kristensen GB, Liestet K, Vlatkovic L, Reith A, Albregtsen F, Danielsen HE. A review of caveats in statistical nuclear image analysis. Analytical Cellular Pathology. 1998;16:63–82. doi: 10.1155/1998/436382. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Bengtsson E. Computerized cell image processing in healthcare; Proceedings of 7th International Workshop on Enterprise networking and Computing in Healthcare Industry; 2005. [Google Scholar]

RESOURCES