Abstract
Goal: Chromosomes are intracellular aggregates that carry genetic information. An abnormal number or structure of chromosomes causes chromosomal disorders. Thus, chromosome screening is crucial for prenatal care; however, manual analysis of chromosomes is time consuming. With the increasing popularity of prenatal diagnosis, human labor resources are overstretched. Therefore, an automatic approach for chromosome detection and recognition is necessary. Methods: In the present study, we proposed a deep learning–based system for the automatic chromosome detection and recognition of metaphase cell images. We used a large database that included 5,000 metaphase cell images consisting of a total of 229,852 chromosomes. The proposed system was then developed and evaluated. The system, called ChromosomesNet, which combines the advantages of one-stage and two-stage models. The model uses original images as inputs without requiring preprocessing; it is thus applicable for clinical settings. To verify the clinical applicability of our system, we included 3,827 simple images and 1,173 difficult images, as identified by physicians, in our database. Results: We used COCOAPI's mAP50 evaluation method, which has average performance and a high accuracy of 99.60%. Moreover, the recall and F1 score of our proposed method were 99.9% and 99.49%, respectively. We also compared our method with five well-known object detection methods, including Faster-RCNN, YOLOv7, Retinanet, Swin transformer, and Centernet++. The results indicated that ChromosomesNet had the highest accuracy, recall, and F1 score. Unlike previous studies that have reported simple chromosome images as identification results, we obtained a 99.5% accuracy in the detection of difficult images. Conclusions: The volume of data we tested, even including difficult images, was much larger than those in the literature. The results indicated that our proposed method is sufficiently stable, robustness, and practical for clinical use. Future studies are warranted to confirm the clinical applicability of our proposed method by using data from other hospitals for cross-hospital validation.
Keywords: Chromosome identification, karyotype image, convolutional neural network, object detection
I. Introduction
The human cell has 22 pairs of chromosomes and one pair of sex chromosomes. An abnormality in the number or structure of chromosomes is known as a chromosomal aberration, which causes chromosomal disorders [1]. Genetic testing for chromosomes is a key part of prenatal maternity tests. The common approach involves using invasive amniocentesis, wherein amniotic fluid is extracted from the uterus. The literature on prenatal diagnosis [2] indicates that approximately one in 150 babies has chromosome abnormalities. Common chromosomal aberrations occur on chromosome pair 13 in Batau syndrome, pair 18 in Edward syndrome, and pair 21 in Down syndrome. According to a study by the National Center for Biotechnology Information (NCBI) [3], chromosomal aberrations are the cause of 25% of miscarriages and stillbirths and 50% to 60% of early miscarriages, making karyotyping a key clinical test in prenatal genetic diagnosis. Karyotyping is a common method of testing chromosomes wherein chromosome features are highlighted in black and gray and then viewed by physicians or examiners to diagnose structural abnormalities. An experienced examiner requires at least 20 minutes to sort, cut, orient, and sequence a raw chromosome cell image, and at least four chromosome images are required to be processed to ensure a correct diagnosis. The labor and time costs associated with chromosome analysis should be addressed because medical workforce shortages are becoming more severe. Automated chromosome analysis systems [4] have been developed to assist medical examiners. However, these systems are not sufficiently accurate and require the involvement of medical examiners throughout the process. Therefore, the problem of workload remains unaddressed [5].
Automated chromosome classification systems are scarce; current systems are largely based on artificial intelligence techniques such as machine learning and deep learning [6], [7], [8], [9]. Studies on chromosome classification are based on segmenting overlaps and adherent chromosomes, and they involve the use of well-known methods such as border detection [10], [11], the watershed method [12], and the straightening of bent chromosomes [13]. These methods often overuse image preprocessing and cause distortions in chromosome features, thereby resulting in misdiagnosis. Recent studies have indicated that medical examiners increasingly prefer chromosome prototyping over preprocessing. Researchers have classified chromosomes through two main approaches. The first approach involves the use of image classification for only single chromosomes; this approach requires the human examiner to expend substantial time and effort on background image segmentation and noise suppression. For example, convolutional neural networks (CNNs) [14], [15], which are used for classifying images, have low accuracy because of low data volumes. Owing to repetitiveness and the presence of variable chromosome features, the CNN approach has limited clinical applicability. The second approach involves the use of original images and the application of deep learning–based object detection [16], [17], [18], [19], [20], [21], [22] to identify and classify chromosomes. For example, DeepACEv2 [23] requires no manual preprocessing and uses object detection as a backbone to frame and classify chromosomes, with only final confirmation and manual editing performed by a medical examiner; thus, this approach has better clinical applicability than has the first approach. In addition, the chromosome images used in previous studies were relatively easy to identify and classify; by contrast, identifying chromosomes in these simple images requires more time and effort from examiners. Thus, an automatic chromosome recognition system should be developed to handle more difficult images for better clinical application.
Most chromosome analysis software programs are image editing programs, which often require user intervention during analysis. In clinical practice, a system that does not require preprocessing is desirable, and the results should be highly accurate even when difficult-to-evaluate images are used. In the present study, we proposed and implemented an objection detection system called ChromosomesNet for automatic chromosome detection and recognition. To verify its clinical applicability, we used metaphase cell images as inputs, with simple images and difficult images use as verification data. Three main features of our study are as follows: 1) the number of images used was larger than the numbers reported in the literature; 2) the advantages of two object models (both accuracy and speed) were incorporated in our system; 3) the accuracy of the system in recognizing complex metaphase cell images exceeded the reported standards in the literature.
II. Materials and Methods
A. Database
A total of 5000 Metaphase cell images were used in this study that obtained from the data of 1598 fetuses of pregnant women undergoing prenatal chromosomal studies between 2014 and 2021 at the Genetic Laboratory of the Department of Women's Medicine, Taichung Veterans General Hospital, and chromosome examinations were obtained from samples of the fetal amniotic fluid. These data sets were approved by the Internal Review Board (IRB) of Taichung Veterans General Hospital (IRB no. CE20369B). We informed all participants and obtained their consent to use their data in this study. The data set was divided into a training set and a test set a ratio of 3:1 (3750:1250). Details of the data set are summarized in Table I. Images and their associated annotations are publicly available on CELL IMAGE LIBRARY, which is a well-known website with a diverse library of cellular images [24]. For more detailed information about the database, please refer to our previous articles published in Scientific Data [25].
TABLE I. Details of the Training and Test Data Sets.
Image (per) | Object (Chromosome) | |
---|---|---|
Training | 3750 | 172368 |
Testing | 1250 | 57484 |
Total | 5000 | 229852 |
Not all chromosomes in our sample were healthy. A normal chromosome image has 22 pairs of somatic chromosomes and one pair of sex chromosomes. However, a small number of samples in our data set did not have 46 chromosomes in the original image. Moreover, when the examiner viewed the Petri dish and adjusted the microscopic lens, the heterochromosome group was not completely excluded, and the shape and features of the red line were clearly different from the main chromosome group. The database was classified as difficult and simple. The classification was based on a comparison between the selection criteria of the examiner and the accuracy of the ChromsomesNet test. Table II displays the details of the difficult and simple data sets. The rationale for the classification compared with the report on DeepACEv2 is discussed further in this paper. Three definitions of difficult images are adopted by medical examiners; corresponding example are illustrated in Fig. 1(a)–(c).
-
1)
Multiple chromosomes overlap (Fig. 1(a)), and the common overlap of two chromosomes can be easily identified by an examiner using a model. When more than three overlapping chromosomes are present, the accuracy decreases.
-
2)
When a color is low intensity and features are unclear (Fig. 1(b)), the examiner should adjust the microscope to increase the intensity of such images. Moreover, because of poor staining, the black, white, and gray bands appear dull.
-
3)
The chromosome displayed in Fig. 6(c) is exceedingly elongated. The elongated chromosome stretches the band feature and is prone to further overlapping.
TABLE II. Difficult and Simple Images in Training Set and Testing Set.
Training set (per) | Testing set (per) | |
---|---|---|
Difficult | 1050 (28%) | 123 (9.8%) |
Simple | 2700 (72%) | 1127 (90.2%) |
Total | 3750 | 1250 |
Fig. 1.
Examples of a difficult image according to the three definitions. (a) Three overlapping chromosomes. (b) Low-intensity color and unclear features; the white band turned black during the overlap. (c) Elongated chromosomes with more overlapping locations.
Fig. 6.
Prediction frame output from the one-stage framework (IoU = 0.5) is the input to the first stage (IoU = 0.6) and then passes sequentially through the second and third stages (IoU = 0.7, 0.8). After each iteration, the proposals change, and the strategy of using different threshold values can effectively improve the quality of the prediction frame's IoU.
B. Structure of ChromosomesNet
As illustrated in Fig. 2, the ChromosomesNet architecture was divided into three parts: 1) a backbone network based on Res2Net101 [26] and a weighted bidirectional feature pyramid network (BiFPN) [22]; 2) a prediction stage with a one-stage architecture for generating compact prediction frames along with nonmaximum suppression (NMS) [27] on nonmaximal suppression values to remove excess frames; 3) a classification stage with a two-stage architecture and the classification results of the one-stage architecture. In the present study, we used the original chromosome map obtained from the gene laboratory as the input of the backbone network Res2Net101 and BiFPN to generate feature maps and output them to the one-stage and two-stage architectures for processing. The two-stage processing system maps the predicted boxes to the feature map; after ROI pooling, the classification results are summed with the classification results from the one-stage architecture.
Fig. 2.
Structure of ChromosomesNet.
C. The Architectures of Backbone Network
We used the Res2Net101 CNN but removed the last classification layer to output only the feature map from the CNN. Backbone CNNs achieve powerful multiscale representation of features. Compared with the layered multiscale strategy of existing CNNs, Res2Net has a more subtle multiscale architecture. The Res2Net network uses smaller filter groups instead of 3 × 3 filters, which further improves the functionality of the bottleneck layer (Fig. 3). Herein, the input feature maps were divided into multiple groups, and the output of individual filters were connected in a similar means to hierarchical residuals. Finally, all groups of feature maps were connected, and the output was connected to 1 × 1 filters to achieve a finer multiscale view.
Fig. 3.
Multiscale architecture strategy for the bottleneck layer (left) and Res2Net (right).
In addition to Res2Net, we tested BiFPN for its improvement of features. FPNs [20] are extensively used for multiscale feature fusion in object detection models, wherein fusion of the features of different scales compensates for the weakness of single-scale features. The different FPN scales in each layer are treated equally and have different weights in BiFPN, which requires more extensive computation than an FPN but generates a more complete feature map. We used the D4 version with six scales of features to effectively solve the problem of large differences in chromosome size, as illustrated in Fig. 4.
Fig. 4.
Pyramid features represent features of different scales extracted from the backbone network. Dashed-line boxes represent BiFPN layers. Objects depicted by the same color are of the same scale. The results of different scales are exported to the category and prediction box by a convolution layer.
D. One- and Two-Stage Architectures
As illustrated in Fig. 5(a), the classical one stage has a large skeleton network to output feature maps and generate prediction frames using an ROI pool, where the first stage simultaneously performs re-regression and classification of the prediction frames. For the first stage, the model adopts the RetinaNet architecture [28] based on the concept of CenterNetv2 [29] to determine an object centroid and bounding box through regression. Generalized intersection over union (GIoU) [30] is used to identify the object centroid and regression-corrected bounding box. During network training, the prediction box is continuously scaled and shifted to acquire more accurate coordinates of the regression prediction box. As chromosome overlap and phase adhesions occur frequently, we used one phase as the first stage of our model instead of using a straightforward two-stage model. Compared with a two-stage model, a one-stage model uses fewer but more accurate prediction frames; it reduces computational effort, more accurately segments backgrounds and objects, and has higher recognition speed.
Fig. 5.
(a) Classical one-stage model framework; the model performs both prediction (less) and classification at a high recognition speed. (b) Classical two-stage model framework, which separates the prediction box (many) from the classification and achieves high accuracy.
Fig. 5(b) displays the structure of a classical two-stage model, which outputs feature maps using a skeleton network to the first stage, extracts the candidate box with a sliding window, classifies the target box, distinguishes the object from the background, and uses regression to offset and scale the box positions to form a more accurate prediction box. In the second stage, the predicted frames from the first stage are sorted, and all the predicted frames are mapped to the feature map through ROI pooling before being sorted at the end.
We retained the second stage of the two stages and adopted the Cascade R-CNN architecture [17]. In a cascade R-CNN, the general target detection in the regression stage has a single IoU threshold (typically 0.5). Because an exceedingly high IoU reduces the proportion of positive samples leading to overfitting. The cascade strategy involves different threshold values and an IoU with multiple threshold values. As displayed in Fig. 6, we used three IoU thresholds (0.6, 0.7, and 0.8). The prediction frame output from the one-stage framework passes through these three different thresholds; it can accommodate a greater range of distribution and more effectively predict overlapping objects, thereby resulting in prediction frames of higher quality. Therefore, the cascade strategy can achieve excellent results for chromosome images at a high IoU.
E. IoU Strategy to Address Overlapping and Stickiness
The overlapping and adhesion observed in the chromosome images are challenges to automatic chromosome classification. Identifying the IoU threshold is key to detecting chromosomes. Fig. 7 illustrates the IoU, which is calculated by dividing the total area of the two boxes by the overlapping area. The IoU of the two prediction boxes on the far right part of Fig. 7 is 0.69. The IoU threshold can be set so as to either retain or not retain the two prediction boxes.
Fig. 7.
Illustration of IoU; the IoU threshold can be set to either retain or not retain the two prediction boxes.
Fig. 8 illustrates the importance of the IoU in chromosome detection. The green box was retained because it overlapped with the red dashed box, but the overlap was <0.8. However, the red dashed box was unsatisfactory. Setting a proper IoU threshold was necessary, but in a complex and variable chromosome image, the same threshold cannot be set for all objects.
Fig. 8.
Overlapping chromosome clusters; if both green and red boxes have an IoU of more than 0.5 with the real box, it is judged as a true positive; otherwise, it is judged to be a false negative.
According to our observations, chromosomes may have different IoU values depending on their size. First, most small chromosomes (e.g., 19–22 pairs, Y) similar to those displayed in Fig. 9(a) are highly independent, and they can be marked accurately. Because of overlapping, the small chromosomes are often hidden behind larger chromosomes with a large overlap area. Second, medium-sized chromosomes (e.g., 13–16 pairs) are easily affected by satellite bodies, as illustrated in Fig. 9(b). In the present study, the green box was the highest scoring prediction box, but the satellite body was obscure. The green box also marked a smaller similar red box with an IoU not greater than the specified threshold, and the incorrect red box was therefore retained. The short arms of the 16 chromosome pairs (P-arms) overlapped with the other chromosomes, leaving two erroneous red boxes, as illustrated in Fig. 9(c). Third, the two erroneous red boxes might not have been filtered out under the threshold for larger chromosomes (e.g., 1–5 pairs) with greater lengths and a more curved morphology, as displayed in Fig. 9(d). Therefore, we applied different threshold values for chromosomes according to their size. An NMS IoU threshold of 0.7 was used for all overlapping medium-sized chromosomes and large chromosomes.
Fig. 9.
Examples of overlapping chromosomes with different sizes.
III. Results
A. Performance Evaluation
The proposed system developed in this study uses the COCOAPI mAP to verify and calculate accuracy with reference to real values corresponding to the test data. The number of difficult images recognized in this test set was 123 (9.8%) for a fair comparison with DeepACEv2. The initial learning rate was equal to the best warm-up learning rate. Owing to the high complexity of the original chromosome image and the large cross-scale of each object, the learning rate did not exceed 0.001. When the total loss value of current validation exceeded the total loss value of the previous validation by five times, training was terminated and the results were recorded. After the amount of data and accuracy were stabilized, verification was performed every 10 epochs for 150 epochs, and the corresponding accuracy and total loss were recorded.
Three metrics were measured for the proposed system: accuracy (Acc), recall, and F1 score. The first task of the system involved separating the chromosomes from the image background. The object-detected prediction frames had to accurately match the chromosomes. Only the successfully selected chromosomes were classified:
Three metrics were measured for the proposed system: accuracy (Acc), recall, and F1 score. The first task of the system involved separating the chromosomes from the image background. The object-detected prediction frames had to accurately match the chromosomes. Only the successfully selected chromosomes were classified:
-
•
True positive (TP): the IoU between the predicted box and the ground truth exceeded the specified threshold and with the highest score among all predicted boxes assigned to that chromosome.
-
•
True negative (TN): The chromosomes were objects divided into 23 categories. The TN condition did not exist.
-
•
False positive (FP): The ground truth was not detected by any predicted bounding box.
-
•
False negative (FN): The predicted box was successfully marked onto the chromosome, but the IoU between the box and the ground truth did not reach the specified threshold.
The precision, recall, and F1-score were calculated as follows:
![]() |
The accuracy (IoU >= 0.5) was calculated as the product under the curve of precision and recall. Accuracy was used to measure the percentage of correct classification. The following formula was used to calculated Acc:
![]() |
B. Experimental Results
In this study, we used 1250 chromosome images as the test data and derived complete evaluation results for the ChromosomesNet system. Table III displays the results for ChromosomeNet and its variants, i.e., ablation experiment, in terms of three evaluation metrics. First, we used ResNet101 with an FPN and a GIOU loss function (a loss for training bounding box regression) as the initial network structure for ChromsomesNet. However, such a backbone network was unable to capture finer features; it could not precisely locate and classify small chromosomes. Then, we used Res2Net101 as the feature extraction network for the new model and added the BiFPN. The Res2Net+BiFPN backbone network was chosen because Res2Net provides excellent multi-scale feature extraction capabilities, which is crucial for handling chromosomes of varying sizes and morphologies in metaphase images. Its hierarchical residual-like structure enables more detailed feature representation compared to traditional backbone networks. The BiFPN enhances feature fusion by treating different feature pyramid scales with distinct weights, allowing better integration of multi-scale chromosome features. This is particularly important for our task as chromosomes can vary significantly in size and shape.
TABLE III. Results for ChromosomesNet and Its Variants in Terms of Three Performance Metrics.
Method | Acc (%) | Recall (%) | F1-score (%) |
---|---|---|---|
ChromosomesNet (ResNet101+FPN + GIOU Loss) | 97.36 | 98.90 | 98.14 |
ChromosomesNet (Res2Net101+FPN + GIOU Loss) | 98.54 | 99.50 | 98.99 |
ChromosomesNet (Res2Net101 +BiFPN + GIOU Loss) | 98.78 | 99.60 | 99.19 |
ChromosomesNet (Res2Net101 +BiFPN + GIOU Loss + MULTI-IOU) | 98.91 | 99.81 | 99.44 |
ChromosomesNet (Res2Net101 +BiFPN + DIOU Loss + MULTI-IOU) | 99.60 | 99.90 | 99.49 |
(FPN: Feature pyramid network, BiFPN: Bidirectional feature pyramid network, MULTI-IOU: IoU strategy to solve overlap and stickiness)
As displayed in Table III, the proposed method outperformed outer methods in all three-evaluation metrics. A MULTI-IOU strategy used for chromosomes of different sizes also improved accuracy. Finally, we used an advanced IOU loss function (DIOU) that enabled better identification of small chromosomes compared with the use of the GIOU loss function. We could examine almost all chromosomes by filtering overlapping object frames in the NMS stage using different IoU thresholds for chromosomes with different sizes, and the retention of the small chromosomes subject to overlap. The MULTI-IOU + DIOU Loss combination was implemented because DIOU Loss considers both the overlap area and the distance between prediction box centers and ground truth boxes, leading to more accurate chromosome localization compared to traditional IOU loss. The MULTI-IOU strategy adapts to chromosomes of different sizes by applying varying IOU thresholds, effectively addressing the challenge of overlapping chromosomes and improving detection accuracy, especially in complex cases. Furthermore, ChromosomesNet exhibited an accuracy of 99.60% when a large test data set was used, indicating the stability and robustness of our model.
To better understand the conditions under which our model might make incorrect judgments, Table IV presents the accuracy for each chromosome pair in our testing set. The analysis reveals a clear pattern in model performance: chromosomes pairs 1–18 consistently achieve accuracy rates above 99.40%, while pairs 19-22 and the sex chromosomes show relatively lower accuracy rates (below 99%). This performance discrepancy can be attributed to several key factors. The shorter length of these later chromosome pairs makes them more susceptible to detection errors, and overlapping with longer chromosomes often leads to occlusion issues. Additionally, the smaller size results in fewer distinctive features for accurate classification. These findings highlight specific areas requiring attention for future model improvements.
TABLE IV. The Results of Each Chromosome Pair (testing Set).
category | Acc (%) | category | Acc (%) | category | Acc (%) |
---|---|---|---|---|---|
A1 | 99.53 | A2 | 99.97 | A3 | 99.96 |
B4 | 99.98 | B5 | 99.92 | C6 | 99.95 |
|
99.87 | C8 | 99.88 | C9 | 99.40 |
|
99.83 | C11 | 99.97 | C12 | 99.94 |
|
99.97 | D14 | 99.80 | D15 | 99.98 |
|
99.84 | E17 | 99.44 | E18 | 99.40 |
|
98.71 | F20 | 98.87 | G21 | 98.93 |
|
98.60 | X | 97.56 | Y | 97.49 |
IV. Discussion
A. Clinical Applicability
To demonstrate clinical applicability, we first selected the images that were representative of easy and difficult chromosome identification based on the definitions stated earlier in this paper. Subsequently, we selected highly difficult images reported in the literature. Fig. 10(a) displays a simple image with considerably few overlaps and adhesions, a moderate chromosome size, and clear features; ChromosomesNet was able to achieve 100% accuracy on this image. Fig. 10(b) displays a relatively difficult image; despite clear features, more chromosome overlaps and adhesions were observed in this image. The recognition accuracy of the proposed model was 97.8%. As illustrated in Fig. 10(c), more overlaps and adhesions were noted in the highly difficult image. In addition, chromosome features were more variable; overlaps with impurities were noted. However, the recognition accuracy of the proposed method was consistently higher than 95%.
Fig. 10.
Examples of a (a) simple chromosome image, (b) difficult chromosome image, and (c) highly difficult chromosome image. For simple images, 100% accuracy can be achieved. For more difficult images, the number of overlapping and adherent chromosomes increased significantly, and the chromosomes that were not captured accurately were those that fell between the three overlapping chromosomes (accuracy: 97.8%). For highly difficult images (c), two chromosomes were unmarked in a highly difficult chromosome image, resulting in the model having 95.65% recognition accuracy.
As indicated by the aforementioned three results, overlaps and adhesions of chromosome images reduce recognition accuracy and increase the likelihood of chromosomes overlapping for longer, thereby affecting model performance. Because the influence of image noise was minimal, noise removal was not required. Finally, completely nonoverlapped and entirely adhered chromosome images were rare in our data set. Most images were similar to the difficult images displayed in Fig. 10(c). For such difficult images, ChromosomesNet achieved 99.15% accuracy, indicating the clinical applicability of the proposed method.
While experienced medical technologists typically require approximately 15 minutes to complete the identification and analysis of all chromosomes in a metaphase image, ChromosomesNet can accomplish the same task in just 1.5 seconds on a standard office-grade computer (Intel i7 processor, 16GB RAM, without requiring a dedicated GPU). This represents a 600-fold improvement in processing speed. The model's ability to achieve such efficiency on modest hardware makes it particularly suitable for widespread clinical deployment, as it eliminates the need for expensive computational resources while maintaining real-time performance. These results demonstrate that ChromosomesNet not only offers accuracy but also provides the computational efficiency necessary for practical clinical applications.
B. Comparison With Other Object Detection Models
The proposed model can predict fewer but more accurate frames in the prediction frame stage, thereby reducing the number of calculations and obtaining better classification results compared with classical two-stage models. Moreover, our model generates accurate classification results with similar recognition speeds compared with classical one-stage models. In summary, our model has higher classification accuracy. Table V presents a comparison of the performance levels of Faster-RCNN [31], YOLOv7 [32], Retinanet [28], Swin transformer [33], and CenterNet++ [34] with that of ChromosomesNet. Owing to continuous advancements in backbone networks, we adopted ResNet101 [35] as the backbone network of Faster-RCNN and Retinanet and added a FPN to improve performance. The results indicated that ChromosomesNet has the highest accuracy, recall, and F1 score.
TABLE V. Performance Levels of ChromosomesNet and Five Other Object Detection Models When Used on Our Data Set.
Method | Acc (%) | Recall (s) | F1-score (%) |
---|---|---|---|
Faster-RCNN (ResNet101+FPN) | 96.97 | 80.80 | 88.10 |
YOLOv7 (E-ELAN+PAFPN) | 98.84 | 99.80 | 99.35 |
Retinanet (ResNet101+FPN) | 96.23 | 82.40 | 88.76 |
Swin transformer (Swin-transformer-Base + FPN) | 98.70 | 98.40 | 98.56 |
Centernet++ (Res2Net101 + FPN) | 91.50 | 93.82 | 92.53 |
ChromosomesNet (Res2Net101 +BiFPN + MULTI-IOU + DIOU Loss) |
99.60 | 99.90 | 99.49 |
(FPN: Feature pyramid network, BiFPN: Bidirectional feature pyramid network, MULTI-IOU: IoU strategy to solve overlap and stickiness)
C. Comparison of Difficult Images
With the increased popularity of genetic screening, fertility rates have decreased, and the use of prenatal genetic testing has increased, indicating that automated chromosome analysis systems should have larger and more diverse databases. In previous studies on automated chromosome classification, nonpublic databases containing fewer than 1000 chromosome images were used, and only nonoverlapping and nonadhesive chromosome images were analyzed. Only small percentages of chromosome images were nonoverlapping and nonadhesive, and more time was required by the examiner. Thus, although previously reported models have accuracy levels higher than 90%, they are less clinically applicable when the data set is limited and the chromosome images are simple [36].
At the time of writing, DeepACEV2 [23] is the most recently developed and most accurate method for automated chromosome detection. To compare our method with that method, we used a training to test set ratio of 3:1. As illustrated in Table VI, ChromosomesNet not only has higher accuracy (by 0.76%), but also it can handle test data volumes greater than three times the volume that DeepACv2 can handle; this indicates the stability and robustness of the proposed ChromosomesNet system. In this study, difficult images were distinguished in a similar manner to the DeepACEv2 approach, wherein overlapping chromosomes were treated as difficult images; overlapping chromosomes were identified, and their accuracy was calculated based on the object IoU. Medical examiner expertise and ChromsomeNet recognition results were combined to identify difficult images. Our experiment used a more stringent approach to differentiate between the difficulty levels of images, and the results was mostly consistent with the clinical expectations of the examiners. As illustrated in Table VII, ChromosomeNet achieved an accuracy level of 99.51% despite the size of the data set (four times larger than other data sets) and the difficulty of images used. ChromosomeNet outperforms DeepACEv2 by a substantial margin of 3.57% in accuracy. These results not only validate our model's effectiveness but also demonstrate its robust performance in handling complex clinical cases.
TABLE VI. Comparison of the Results Obtained Using DeepACEv2 and the Proposed Method.
Method | Training set(/per) | Testing set (per) | Acc (%) | F1-score (%) |
---|---|---|---|---|
DeepACEv2 | 37819/825 | 12614/275 | 98.84 | 99.42 |
ChromosomesNet (ours) | 172437/3750 | 57484/1250 | 99.60 | 99.49 |
TABLE VII. Comparison of the Results for Difficult Images When deepACEv2 and Our Proposed Method Were Used.
Method | Difficult image(/per) | Acc(%) | Recall(%) | F1-score (%) |
---|---|---|---|---|
DeepACEv2 | 1232/27 | 95.94 | 97.10 | 97.93 |
ChromosomesNet (ours) | 5211/123 | 99.51 | 98.90 | 99.22 |
D. Limitation
When applying our model to other clinical datasets, several challenges could arise. These challenges mainly stem from variations in laboratory conditions across different medical institutions, including chromosome cultivation protocols, staining methods and techniques, and microscope imaging parameter settings. Additionally, variations in image characteristics, such as quality differences between institutions, resolution variations, and changes in contrast and brightness, could all affect model performance.
To address these challenges and enhance our model's robustness, we propose two future improvement directions. First, we plan to conduct multi-center cross-hospital validation studies, which will help evaluate model performance across different clinical settings, understand the impact of various laboratory protocols, and contribute to establishing standardization guidelines for image acquisition. Second, we will implement advanced deep learning techniques, such as using domain adaptation methods [37], [38], [39] to handle institutional variations, enabling self-adaptive fine-tuning for new data, and conducting robust feature extraction across different imaging conditions. These optimization strategies will be key focuses in our future research to ensure reliable model performance across different clinical settings. We believe these improvements will significantly enhance the clinical applicability of our model.
E. Future Works
ChromosomesNet can classify and capture chromosomes and finally generate an output karyotype. In clinical settings, inconsistent chromosome karyotypes can affect an expert's judgment. Therefore, all chromosomes should be corrected for orientation, as illustrated in Fig. 11. The main functions of chromosomes are duplicating chromosomes and distributing them evenly among daughter cells during mitosis and meiosis. If chromosomes are lost, they lose their function of attaching to spindle filaments and thus cannot enter daughter cells equally during cell division. In case of an error, the chromosome number is abnormal. According to the position of the chromosome, the shorter side of the chromosome is called the short arm (p-arm), and the other side is the long arm (q-arm). The chromosome orientation is corrected by placing the p-arm on top and the q-arm below. However, the identification of chromosomes is challenging and particularly difficult for chromosome pairs 13, 14, 15, 21, and 22. Automatic chromosome classification studies are greater in number than chromosome orientation correction studies [40], [41]. Therefore, a complete automatic chromosome recognition system should be developed.
Fig. 11.
After the sorting of intercepted chromosomes, in addition to background removal, the chromosomes should be rotated into an upright position.
Moreover, our model's interpretability is primarily reflected in its output structure, which includes bounding box coordinates and confidence scores for chromosome classification that serve as quantitative indicators of decision certainty. Higher confidence scores indicate chromosomes with straightforward characteristics resulting in more reliable model decisions, while lower scores suggest more challenging cases with complex features or unusual morphologies. Acknowledging that AI decisions cannot be guaranteed to be absolutely correct, we suggest a human-AI collaborative approach with a threshold-based system for confidence scores in the future: predictions above the threshold could be accepted with minimal review, while those below would require careful verification by medical experts. This collaborative framework aims to reduce medical professionals' workload while maintaining clinical result reliability. Future work will focus on validating this approach through studies to determine optimal confidence thresholds and evaluate their effectiveness in real clinical settings.
V. Conclusion
In this study, we developed a deep learning–based chromosome classification and extraction method that replaces two stages of prediction with a more sophisticated one-stage detection to obtain high accuracy while maintaining a desirable recognition speed. The proposed method was divided into three stages. First, a Res2Net+BiFPN backbone network was used to output feature maps. Second, a one-stage strategy was used as the first stage of prediction to output prediction frames and classification results. Finally, a two-stage strategy was used to perform the next round of classification of all prediction frames. The results indicated an accuracy of 98.91% for the test data set that included more than 57000 chromosomes. The performance of the proposed system was considerably higher than that of other systems reported in the literature. Unlike previous studies that have reported simple chromosome images as identification results, we obtained a 99.60% accuracy in the detection of difficult images. Our definition of “difficult” incorporates expert opinion in conjunction with ChromosomesNet identification results. The results indicated that our model is sufficiently stable and practical for clinical use. Future studies are warranted to confirm the clinical applicability of our proposed method by using data from other hospitals for cross-hospital validation.
Conflict of Interest
No potential confilct of interest was reported by all authors.
Author Contribution
Data collection: JJ Tseng, FC Lo, MJ Chen. Data preparation: JZ Li, CH Lu. Deep learning analyses: CE Kuo, JZ Li. Manual analyses: CE Kuo, CH Lu. All authors have contributed to the study design, results interpretation, writing manuscript and read and approved the final manuscript.
Funding Statement
This work was supported in part by the Taichung Veterans General Hospital under Grant TCVGH-AI-10903 and Grant TCVGH-AI-11002 and in part by the National Science and Technology Council, Taiwan, under Grant 113-2221-E-005-081-MY3 and Grant 113-2221-E-075A-007.
Contributor Information
Chih-En Kuo, Email: cekuo@nchu.edu.tw.
Chien-Hsing Lu, Email: chlu@vghtc.gov.tw.
References
- [1].Wapner R. J. et al. , “Chromosomal microarray versus karyotyping for prenatal diagnosis,” New England J. Med., vol. 367, no. 23, pp. 2175–2184, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Carlson L. M. and Vora N. L., “Prenatal diagnosis: Screening and diagnostic tools,” Obstet. Gynecol. Clin. North America, vol. 44, no. 2, pp. 245–256, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Theisen A. and Shaffer L. G., “Disorders caused by chromosome abnormalities,” Application Clin. Genet., vol. 3, pp. 159–174, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Litjens G. et al. , “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, Dec. 2017, doi: 10.1016/j.media.2017.07.005. [DOI] [PubMed] [Google Scholar]
- [5].Pesapane F., Codari M., and Sardanelli F., “Artificial intelligence in medical imaging: Threat or opportunity? Radiologists again at the forefront of innovation in medicine,” Eur. Radiol. Exp., vol. 2, no. 1, 2018, Art. no. 35. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Jindal S., Gupta G., Yadav M., Sharma M., and Vig L., “Siamese networks for chromosome classification,” in Proc. IEEE Int. Conf. Comput. Vis. Workshops, 2017, pp. 72–81. [Google Scholar]
- [7].Karvelis P. S., Fotiadis D. I., Georgiou I., and Syrrou M., “A watershed based segmentation method for multispectral chromosome images classification,” in Proc. 2006 Int. Conf. IEEE Eng. Med. Biol. Soc., 2006, pp. 3009–3012. [DOI] [PubMed] [Google Scholar]
- [8].Litjens G. et al. , “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, 2017. [DOI] [PubMed] [Google Scholar]
- [9].Qin Y. et al. , “Varifocal-Net: A chromosome classification approach using deep convolutional networks,” IEEE Trans. Med. Imag., vol. 38, no. 11, pp. 2569–2581, Nov. 2019. [DOI] [PubMed] [Google Scholar]
- [10].Nimitha N., Arun C., Puvaneswari A., Paninila B., Pavithra V., and Pavithra B., “Literature survey of chromosomes classification and anomaly detection using machine learning algorithms,” in Proc. IOP Conf. Ser.: Mater. Sci. Eng., vol. 402, no. 1, 2018, Art. no. 012194. [Google Scholar]
- [11].Yan W. and Shen S., “An edge detection method for chromosome images,” in Proc. IEEE 2nd Int. Conf. Bioinf. Biomed. Eng., 2008, pp. 2390–2392. [Google Scholar]
- [12].Lin C. et al. , “CIR-Net: Automatic classification of human chromosome based on inception-ResNet architecture,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 19, no. 3, pp. 1285–1293, May/Jun. 2022. [DOI] [PubMed] [Google Scholar]
- [13].Ding W., Chang L., Gu C., and Wu K., “Classification of chromosome karyotype based on faster-RCNN with the segmatation and enhancement preprocessing model,” in Proc. 12th Int. Congr. Image Signal Process., Biomed. Eng. Inform., 2019, pp. 1–5. [Google Scholar]
- [14].Krizhevsky A., Sutskever I., and Hinton G. E., “ImageNet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017. [Google Scholar]
- [15].Hu X. et al. , “Classification of metaphase chromosomes using deep convolutional neural network,” J. Comput. Biol., vol. 26, no. 5, pp. 473–484, 2019. [DOI] [PubMed] [Google Scholar]
- [16].Bochkovskiy A., Wang C.-Y., and Liao H.-Y. M., “Yolov4: Optimal speed and accuracy of object detection,” 2020, arXiv:2004.10934.
- [17].Cai Z. and Vasconcelos N., “Cascade R-CNN: Delving into high quality object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6154–6162. [Google Scholar]
- [18].Kong T., Sun F., Liu H., Jiang Y., Li L., and Shi J., “Foveabox: Beyound anchor-based object detection,” IEEE Trans. Image Process., vol. 29, pp. 7389–7398, 2020. [Google Scholar]
- [19].Law H. and Deng J., “Cornernet: Detecting objects as paired keypoints,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 734–750. [Google Scholar]
- [20].Lin T.-Y., Dollár P., Girshick R., He K., Hariharan B., and Belongie S., “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2117–2125. [Google Scholar]
- [21].Redmon J., Divvala S., Girshick R., and Farhadi A., “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 779–788. [Google Scholar]
- [22].Tan M., Pang R., and Le Q. V., “Efficientdet: Scalable and efficient object detection,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 10781–10790. [Google Scholar]
- [23].Xiao L. et al. , “DeepACEv2: Automated chromosome enumeration in metaphase cell images using deep convolutional neural networks,” IEEE Trans. Med. Imag., vol. 39, no. 12, pp. 3920–3932, Dec. 2020. [DOI] [PubMed] [Google Scholar]
- [24].Lu C.-H., Kuo C.-E., and Tseng J.-J., CIL:54816, Homo sapiens Linnaeus, 1758, epithelial cell. CIL. Dataset. CIL. Dataset. doi: 10.7295/W9CIL54816. [DOI]
- [25].Tseng J.-J. et al. , “An open dataset of annotated metaphase cell images for chromosome identification,” Sci. Data, vol. 10, no. 1, 2023, Art. no. 104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Gao S.-H., Cheng M.-M., Zhao K., Zhang X.-Y., Yang M.-H., and Torr P., “Res2Net: A new multi-scale backbone architecture,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 2, pp. 652–662, Feb. 2021. [DOI] [PubMed] [Google Scholar]
- [27].Neubeck A. and Van Gool L., “Efficient non-maximum suppression,” in Proc. IEEE 18th Int. Conf. Pattern Recognit., 2006, vol. 3, pp. 850–855. [Google Scholar]
- [28].Lin T.-Y., Goyal P., Girshick R., He K., and Dollár P., “Focal loss for dense object detection,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2980–2988. [DOI] [PubMed] [Google Scholar]
- [29].Zhou X., Koltun V., and Krähenbühl P., “Probabilistic two-stage detection,” 2021, arXiv:2103.07461.
- [30].Rezatofighi H., Tsoi N., Gwak J., Sadeghian A., Reid I., and Savarese S., “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 658–666. [Google Scholar]
- [31].Ren S., He K., Girshick R., and Sun J., “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Proc. 28th Int. Conf. Neural Inf. Process. Syst., 2015, vol. 1, pp. 91–99. [DOI] [PubMed] [Google Scholar]
- [32].Wang C.-Y., Bochkovskiy A., and Liao H.-Y. M., “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 7464–7475. [Google Scholar]
- [33].Liu Z. et al. , “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 10012–10022. [Google Scholar]
- [34].Duan K., Bai S., Xie L., Qi H., Huang Q., and Tian Q., “CenterNet++ for Object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 5, pp. 3509–3521, 2024. [DOI] [PubMed] [Google Scholar]
- [35].He K., Zhang X., Ren S., and Sun J., “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778. [Google Scholar]
- [36].Huang R. et al. , “A clinical dataset and various baselines for chromosome instance segmentation,” IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 19, no. 1, pp. 31–39, Jan./Feb. 2022. [DOI] [PubMed] [Google Scholar]
- [37].Li S., Zhao S., Zhang Y., Hong J., and Chen W., “Source-free unsupervised adaptive segmentation for knee joint MRI,” Biomed. Signal Process. Control, vol. 92, 2024, Art. no. 106028. [Google Scholar]
- [38].Hong J., Zhang Y.-D., and Chen W., “Source-free unsupervised domain adaptation for cross-modality abdominal multi-organ segmentation,” Knowl.-Based Syst., vol. 250, 2022, Art. no. 109155. [Google Scholar]
- [39].Hong J., Yu S. C.-H., and Chen W., “Unsupervised domain adaptation for cross-modality liver segmentation via joint adversarial learning and self-learning,” Appl. Soft Comput., vol. 121, 2022, Art. no. 108729. [Google Scholar]
- [40].Gidaris S., Singh P., and Komodakis N., “Unsupervised representation learning by predicting image rotations,” 2018, arXiv:1803.07728.
- [41].Liu S. et al. , “Deep learning in medical ultrasound analysis: A review,” Engineering, vol. 5, no. 2, pp. 261–275, 2019. [Google Scholar]