This diagnostic study assesses an expert-level detection algorithm for Plasmodium falciparum, a bacteria that causes malaria, using a publicly available benchmark image data set.
Key Points
Question
Can deep learning be used to develop an automated malaria detection algorithm?
Findings
In this diagnostic study that used a 1-stage deep learning framework and benchmark data sets, the malaria detection algorithm achieved expert-level performance in detecting Plasmodium falciparum in thin blood smear images. The comparable performance between the algorithm and human experts was confirmed by a clinical validation study at the cell level and the image level.
Meaning
The findings suggest that a clinically validated expert-level malaria detection algorithm could be used to accelerate the development of clinically applicable automated malaria diagnostics.
Abstract
Importance
Decades of effort have been devoted to establishing an automated microscopic diagnosis of malaria, but there are challenges in achieving expert-level performance in real-world clinical settings because publicly available annotated data for benchmark and validation are required.
Objective
To assess an expert-level malaria detection algorithm using a publicly available benchmark image data set.
Design, Setting, and Participants
In this diagnostic study, clinically validated malaria image data sets, the Taiwan Images for Malaria Eradication (TIME), were created by digitizing thin blood smears acquired from patients with malaria selected from the biobank of the Taiwan Centers for Disease Control from January 1, 2003, to December 31, 2018. These smear images were annotated by 4 clinical laboratory scientists who worked in medical centers in Taiwan and trained for malaria microscopic diagnosis at the national reference laboratory of the Taiwan Centers for Disease Control. With TIME, a convolutional neural network–based object detection algorithm was developed for identification of malaria-infected red blood cells. A diagnostic challenge using another independent data set within TIME was performed to compare the algorithm performance against that of human experts as clinical validation.
Main Outcomes and Measures
Performance on detecting Plasmodium falciparum–infected blood cells was measured by average precision, and performance on detecting P falciparum infection at the image level was measured using sensitivity, specificity, and area under the receiver operating characteristic curve (AUC).
Results
The TIME data sets contained 8145 images of 36 blood smears from patients with suspected malaria (30 P falciparum–positive and 6 P falciparum–negative smears) that had reliable annotations. For clinical validation, the average precision was 0.885 for detecting P falciparum–infected blood cells and 0.838 for ring form. For detecting P falciparum infection on blood smear images, the algorithm had expert-level performance (sensitivity, 0.995; specificity, 0.900; AUC, 0.997 [95% CI, 0.993-0.999]), especially in detecting ring form (sensitivity, 0.968; specificity, 0.960; AUC, 0.995 [95% CI, 0.990-0.998]) compared with experienced microscopists (mean sensitivity, 0.995 [95% CI, 0.993-0.998]; mean specificity, 0.955 [95% CI, 0.885-1.000]).
Conclusions and Relevance
The findings suggest that a clinically validated expert-level malaria detection algorithm can be developed by using reliable data sets.
Introduction
Malaria, a mosquito-borne disease caused by Plasmodium species, is a severe and reemerging global health issue despite years of effort in global malaria control. In 2017, an estimated 219 million cases of malaria and 435 000 malaria-related deaths occurred worldwide.1 Most patients were in the World Health Organization African region (92%) and South-East Asian region (5%). Although Taiwan has been certified malaria free for more than 5 decades, imported cases, mostly from Africa and Southeast Asia and caused by Plasmodium falciparum, still occur every year.2
The criterion standard for malaria diagnosis is microscopic examination.3,4 Thick blood smears are used for screening, whereas thin blood smears are used for confirming the species and measuring parasite density.3 However, conventional microscopic diagnosis is labor intensive and dependent on techniques and experience. This expertise is rare not only in resource-limited countries, where malaria poses a significant burden, but also in countries close to malaria elimination where the microscopists lack experience.4 Both situations prompted efforts to seek more efficient and accurate diagnostic tools. Economical and reliable rapid diagnostic tests (RDTs) can replace thick smears as a screening tool in resource-limited settings. However, RDTs provide insufficient information about species, life-cycle stages, and quantification of parasitemia, which are pivotal for clinical management.3,4 Therefore, researchers from engineering and computer science have performed extensive studies for an automated microscopic examination during the past decade.5 Nevertheless, progress toward a clinically applicable system was slow because of several challenges. First, the variety of staining methods and quality of smear preparation for microscopic blood smear images makes it difficult to devise universal features using traditional approaches. Second, it is hard to compare different algorithms because of different evaluation metrics and the absence of reference benchmark data. The difficulty of acquiring a large number of images with reliable annotations for public reference hinders the development of automated malaria diagnosis.
Traditional approaches to detecting malaria-causing organisms on thin smears involve a multistage approach of image preprocessing, red blood cell (RBC) segmentation, feature engineering, and classification of infected and noninfected RBCs.5 With the advent of deep learning, convolutional neural network (CNN)–based algorithms have achieved expert-level performance in the detection of pathologic characteristics in multiple medical image modalities,6,7,8 and the potential of applying deep learning on both thick and thin smears has been explored.9,10,11 Because thin smears can provide more clinical information, we aimed to assess high-quality image data sets of thin blood smears with expert annotations for public reference, develop a CNN-based algorithm to identify P falciparum infection automatically, and validate its performance in a clinical context.
Methods
For this diagnostic study, our framework began with the acquisition and digitization of thin blood smears, followed by algorithm training and validation at the cell and image levels (Figure 1). The study was approved by the institutional review board of the Taiwan Centers for Disease Control (CDC), which waived informed consent because the clinical samples were acquired from the deidentified biobank of the Taiwan CDC. The study followed the Standards for Reporting of Diagnostic Accuracy (STARD) reporting guideline.
Figure 1. Overall Pipeline of the Malaria Detection Framework.
Data Preparation
Data were retrospectively acquired from ex vivo peripheral blood samples from patients suspected of having P falciparum infection by local physicians and public health workers who alerted the Taiwan CDC from January 1, 2003, to December 31, 2018. Laboratory diagnosis was made by microscopic examination and polymerase chain reaction.12 The details of enrollment of patients and sample collection are described in eMethods 1 in the Supplement.
We used virtual slide microscopes (VS120, Olympus Corp) to scan the blood smears and acquired pictures via digital cameras. The whole slide was sliced into grids of 2 × 2 mm. We then used the 100 × objective scope with oil immersion to scan slides into Olympus virtual slide images (.vsi). Olympus VS-ASW software was used to convert the .vsi files into jpeg files of 2048 × 2048 pixels. All digitized images were annotated by microscopists at the Taiwan CDC using an in-house annotation tool. For each image, 2 levels of annotation were made. At the cell level, the P falciparum–infected RBCs were annotated according to our annotation guideline, which defined the targets and guided the appropriate annotations (eMethods 2 in the Supplement). At the image level, infection status was inferred from cell-level annotations based on whether any bounding box of P falciparum–infected RBCs was identified.
All images in the development set were reviewed and annotated by an experienced microscopist (M.-C.K.) responsible for microscopic diagnosis of malaria at the Taiwan CDC for more than 45 years. Because of the variability among different microscopists,13,14 a reliability test subset was generated by randomly sampling 500 images from blood smears in the development set (19-20 images per smear) and was annotated by 2 other experienced microscopists (S.-F.H. and H.-J.L.), who worked in the parasitology laboratory at the Taiwan CDC for more than 5 years. Interrater reliability was assessed using mean F1 scores (the harmonic mean of precision and recall used as a measure of accuracy and reliability) among pairs of annotators at the cell level,15 and the percentage of agreement and the Fleiss κ were calculated for multiple raters at the image level. For the clinical validation set, all images were annotated by 3 expert microscopists (M.-C.K., S.-F.H., and H.-J.L.) at the Taiwan CDC. Interrater reliability was similarly measured as described above.
Development of the Algorithm
The malaria detection algorithm was developed based on Retinanet, a 1-stage object detection neural network.16 The architecture consisted of a backbone network and 2 subnetworks, as in the original work,14 with a specially crafted loss function termed focal loss. The only modification in our architecture was that 1 fewer convolution layer was adopted for the backbone network. The backbone network was a feature pyramid network built on top of a CNN17 that was responsible for computing convolutional feature maps for input images. ResNet50 pretrained on the ImageNet data set was used for the backbone network.18,19 The first subnetwork is the object class subnet in the architecture of a CNN, which predicts labels given the output of the backbone network. The second subnetwork is the box subnet, which calculates regression on bounding box locations. The network was trained with the development set to identify the location of P falciparum–infected RBCs and to classify their stages.
An Adam optimizer with a learning rate of 1 × 10−5 and gradient clipping were used for training. The α was set to .25 and the γ to 2 for the focal loss. Our models were trained with a computer with an Intel Xeon E5-2630 v4 2.20 GHz central processing unit with a NVIDIA GeForce GTX 1080 Ti GPU 11-GB graphic card and 128 GB of RAM. We referenced our algorithm to the Keras implementation of Retinanet.35
Clinical Validation
A diagnostic challenge was conducted to compare the performance of the algorithm against clinical laboratory scientists. Four practicing clinical laboratory scientists were recruited from 2 medical centers in Taiwan (Chang-Gung Memorial Hospital and Taipei City Hospital Zhongxing Branch and Zhongxiao Branch) to review the clinical validation set and to annotate P falciparum–infected cells. The scientists were trained for malaria microscopic diagnosis in a national reference laboratory at the Taiwan CDC and had 5 to 10 years of working experience.
Statistical Analysis
Algorithm Evaluation
Algorithm performance was evaluated at the cell and image levels. At the cell level, whether bounding boxes identified by the algorithm matched ground truth bounding boxes was determined by a matching process (eMethods 3 and eFigure 1 in the Supplement). Our evaluation focused on 2 primary end points: malaria detection, defined as the detections of any malaria parasites regardless of life cycle stages, and ring form detection, the most common and characteristic stage in peripheral blood smears of P falciparum.20
Metrics conventionally used for object detection in computer vision were applied at the cell level, including precision-recall curves with average precision and free-response receiver operating characteristic (ROC) curves. At the image level, ROC curves were plotted and areas under the ROC curve (AUCs) were computed using the Python package Scikit-learn, version 0.18.1 (Python Software Foundation).21 The 95% CIs for ROC curves were estimated through 1000 iterations of bootstrap analysis on sensitivity and specificity with α = .05. Error rates of microscopic diagnosis were calculated by counting every mistake equally.
Two operating points were selected at the image level to further characterize the algorithm performance. The first operating point, corresponding to the point with the highest Youden index, reflects the optimal point of algorithm performance with high sensitivity. It was chosen because high sensitivity is a prerequisite for a potential screening tool.22,23 The second operating point, the high specificity operating point, which approximates the mean specificity of practicing microscopists in the clinical validation set, could further characterize our algorithm performance against the 4 practicing microscopists. The 95% CIs for the sensitivity and specificity at the 2 operating points were calculated to be exact Clopper-Pearson intervals.24 Two-sided 95% CIs were computed using the Python package StatsModels, version 0.8.0.
Sensitivity Analysis
Experiments were designed to evaluate the association of different labeling strategies with malaria detection. For the development set, 2 sets of meta-labels were processed and inferred from the raw annotations. For the first set, we pooled the labels of young trophozoite (ring form) and trophozoite into the category trophozoite, together with the remaining labels of schizont and gametocyte; for the second set, all labels regardless of their life-cycle stages were merged into the category malaria infection. These 2 sets of meta-labels were then used to train another 2 malaria detection models with similar configurations delineated above. Their performances were measured with the same metrics.
Results
Development and Characteristics of Data Sets
We established Taiwan Images for Malaria Eradication (TIME) data sets, which included 2 data sets: the development set for training and the clinical validation set for evaluating performance. Images in the development set were scanned from 26 blood smear slides (22 P falciparum–positive and 4 P falciparum–negative smears). Images in the clinical validation set were scanned from 10 blood smears (8 P falciparum–positive and 2 P falciparum–negative smears) (Figure 2). The development set included 6845 images from positive slides and 800 from negative slides that were randomly sampled from more than 812 000 raw scanned images. The clinical validation set included 400 images from positive slides and 100 from negative slides.
Figure 2. Workflow of Data Set Establishment.
Of the 7645 images in the development set, 4402 images (57.6%) were identified with 21 220 P falciparum–infected cells annotated, among which the 2 most common life-cycle stages were young trophozoite (ring form; 80.1%) and trophozoite (10.9%), followed by gametocyte (1.3%) and schizont (0.2%). Of the 500 images in the clinical validation set, 400 (80.0%) were identified with 3061 P falciparum–infected cells. The most common life-cycle stages were ring form (2909 [95.0%]) and trophozoite (117 [3.8%]) (Table). Both data sets had considerable variation in parasite density and the level of touching cells (eFigure 2 in the Supplement). For P falciparum–positive smears, the median number of parasite-infected cells per image was 2 (range, 1-68) in the development set and 4 (range, 1-60) in the clinical validation set.
Table. Characteristics of the Plasmodium falciparum–Infected Blood Smear Image Data Setsa.
Characteristic | Development Set | Clinical Validation Set |
---|---|---|
Blood smear slides | 26 | 10 |
Annotators | 1-3b | 3 |
Images | 7645 (100) | 500 (100) |
With P falciparum–infected cells | 4402 (57.6) | 400 (80.0) |
Without P falciparum–infected cells | 3243 (42.4) | 100 (20.0) |
Bounding boxes | 21 220 (100) | 3061 (100) |
Ring form | 16 992 (80.1) | 2909 (95.0) |
Trophozoite | 2313 (10.9) | 117 (3.8) |
Schizont | 35 (0.2) | 1 (0) |
Gametocyte | 267 (1.3) | 0 |
Indeterminate | 1613 (7.6) | 34 (1.1) |
No. of bounding boxes per image, median (range) | 2 (1-68) | 4 (1-60) |
Data are presented as number (percentage) of slides unless otherwise indicated.
One expert annotated all images, and 500 images selected from the whole set were also annotated by 2 additional experts.
Interrater Reliability Test
At the cell level, the mean F1 score of the reliability test subset among pairs of experts was 0.924 (95% CI, 0.901-0.947), and the clinical validation set had an F1 score of 0.954 (95% CI, 0.938–0.970). At the image level, the percentage of agreement was 97.5% for the reliability test subset and 100% for the clinical validation set. The Fleiss κ among the 3 experts was 0.459 in the reliability test subset and 1.000 in the clinical validation set (eFigure 3 and eFigure 4 in the Supplement).
Clinical Validation
At the cell level, our algorithm achieved an average precision of 0.885 in detecting P falciparum–infected RBCs and 0.838 in detecting ring form P falciparum–infected RBCs (Figure 3). At the image level, our algorithm achieved an AUC of 0.997 (95% CI, 0.993-0.999) for malaria detection, which was comparable to experts’ performance (Figure 4A). At the high-sensitivity operating point, the sensitivity of our algorithm was 0.995 (95% CI, 0.982-0.999) and the specificity was 0.900 (95% CI, 0.824-0.951). At the high-specificity operating point, the sensitivity was 0.968 (95% CI, 0.945-0.983) and the specificity was 0.960 (95% CI, 0.901-0.989). For the error rate in identifying P falciparum infection, our algorithm achieved an error rate of 2.4% (95% CI, 1.3%-4.2%), showing no statistically significant difference compared with the mean error rate of microscopists (1.3%, 95% CI, 0.0%-3.5%).
Figure 3. Performance of the Malaria Detection Algorithm at the Cell Level With Precision-Recall Curve and Free-Response Receiver Operating Characteristic (FROC) Curve .
Figure 4. Performance of the Malaria Detection Algorithm at the Image Level With Receiver Operating Characteristic Curve and Error Rate .
The shaded area indicates the 95% CIs. AUC indicates area under the curve.
For ring form detection at the image level, our algorithm had an AUC of 0.995 (95% CI, 0.990-0.998) (Figure 4B). At the high-sensitivity operating point, the sensitivity of our algorithm was 0.987 (95% CI, 0.971-0.996) and the specificity was 0.883 (95% CI, 0.805-0.938). At the high-specificity operating point, the sensitivity was 0.955 (95% CI, 0.929-0.973) and the specificity was 0.971 (95% CI, 0.917-0.994). This performance was comparable to experts' performance (mean sensitivity, 0.995; 95% CI, 0.993-0.998; mean specificity, 0.955; 95% CI, 0.885-1.025). The error rate was 3.4% (95% CI, 1.6%-5.4%), which was comparable to that of microscopists (3.8%; 95% CI, 0.0%-8.9%) (eFigure 5 and eFigure 6 in the Supplement).
Sensitivity Analysis
On the clinical validation set, the model trained with meta-labels that pooled ring form and trophozoites as 1 category had an AUC of 0.987 (95% CI, 0.977-0.994) for malaria detection, whereas the model trained with meta-labels that pooled all parasites together as 1 category had an AUC of 0.994 (95% CI, 0.989-0.998) for malaria detection. The results indicated that the performance of these models trained with different labeling strategies were comparable to that of the original version.
Discussion
In this study, we established benchmark image data sets of thin blood smears for malaria microscopic diagnosis. To our knowledge, this is the first intent to provide publicly available image data sets of microscopic blood smears with reliable annotation. Our data sets provide more than 8000 images with more than 24 000 annotated P falciparum–infected RBCs, encompassing a wide range of variations encountered in clinical settings. A CNN-based object-detection algorithm was developed with promising sensitivity and specificity to identify malaria detection and demonstrated performance similar to that of human microscopic experts (AUC of 0.997 and error rate of 3.4% for malaria detection and AUC of 0.995 and error rate of 2.4% for ring form detection).
From data set generation to algorithm development, our framework was designed with clinical applicability in mind. Previously released data sets typically consisted of images of individually segmented RBCs, designed for the 2-stage approach in which researchers segmented individual RBCs from microscopic images and then developed algorithms to classify them.11,25,26,27 However, touching cells on thin smears makes accurate RBC segmentation challenging. We designed our data sets for the development of 1-stage detection algorithms by providing images directly acquired from slides without additional preprocessing or segmentation. In this manner, we bypassed the segmentation problem and minimized artifacts resulting from human-designed preprocessing, ensuring the highest fidelity of the images with respect to the clinical reality. Retinanet was adopted as our 1-stage detector for its better performance, simpler structure, and previous application to other clinical contexts.28,29 Compared with the 2-stage approach, our algorithm demonstrated encouraging performance in dealing with touching blood cells. Furthermore, because the input images required no preprocessing, we minimized the computation cost both in hardware and processing time, ensuring rapid smear-to-diagnosis turnaround time.
For image annotation, previous data sets were usually annotated by only 1 expert,27,30,31 and the annotations might be susceptible to errors and biases because of the varied training protocol and experience of the individual annotator.13,14 To overcome this challenge, we recruited multiple microscopists and standardized the annotation process. In addition, because the reliability test showed a moderate to high level of agreement among our experts’ annotations at the cell and image levels, we are confident of the reliability of the annotations in our data sets. Furthermore, to our knowledge, this study was the first attempt to validate the algorithm performance against practicing microscopists in a clinical context. The expert-level detection algorithm demonstrated the potential to automate the microscopic examination, providing malaria diagnosis once a blood smear was made and photographed. Beyond streamlining the diagnosis workflow, our algorithm presents an opportunity to preserve and commoditize the increasingly scarce expertise in microscopic diagnosis, especially for countries close to malaria elimination but still dealing with imported cases related to international travel.
Given that malaria burden is greatest in countries with limited resources, we envision deploying a mobile application of our algorithm for unskilled workers with minimum equipment requirements.25,32 Currently, countries where malaria is endemic rely on RDTs to screen for malaria.33 Our algorithm achieved a 99.5% sensitivity and a 90.0% specificity at high-sensitivity operating points, whereas standard RDTs achieved sensitivities of 80% to 95% and specificities of 85% to 99%.32,34 A possible explanation of our algorithm’s superior performance might be that a P falciparum–infected cell in thin smears could provide more details for a deep learning algorithm to identify compared with thick smears. Compared with RDTs, which lack information regarding parasite density and life-cycle stages, our framework not only provided comparable malaria detection performance but also may affect clinical management through automated parasite density estimation, providing pivotal information on disease stratification and treatment response monitoring.34 Manual counting of parasites is labor intensive and impractical in resource-constrained settings. With the use of our algorithm, it might be feasible to derive quantitative estimates of parasite density using methods similar to those described in the World Health Organization guideline3 but without human intervention. In this manner, practitioners can rapidly stratify patients by severity, optimize therapies, and monitor a patient’s therapeutic response.31,34
Limitations
This study has limitations. First, false-negative detection occurred in several challenging situations, for example, when the cytoplasm and nuclei of parasites were obscure or deformed. False-positive detections happened when stained platelets or impurities on top of RBCs mimic the parasites. To perform better in these challenging scenarios, algorithm training with high-quality images focused on those situations would be helpful. Second, the algorithm was developed to identify only P falciparum without other species (eg, Plasmodium vivax or Plasmodium ovale). Nonetheless, the algorithm is expected to be expanded without difficulty to identify the ring form of other species with tuning because Plasmodium species are morphologically similar. Third, our data sets were established from a limited number of patients, which might affect the algorithm’s generalizability. However, the variability in our data sets that resulted from different imported countries, clinical settings, and staining methods might prevent overfitting and help with generalizability. Our clinical validation that mimicked a clinical scenario also showed promising performance for the patients not included in the development set and the potential for real-world application. Nevertheless, how to link negative malaria detection at the cell and image levels with the exclusion of a malaria diagnosis would be another issue for clinical decision making. To further validate and implement the algorithm in real-world settings, work on the clinical workflow design and pilot field validation in countries where malaria is endemic will be required.
Conclusions
In this study, we built a publicly available benchmark image data set of malaria thin blood smears with reliable annotations (TIME) and demonstrated the potential to develop a deep learning–based malaria detection algorithm with expert-level performance. Both the data sets and algorithm may help accelerate the development of automated microscopic diagnosis and a decision support system in resource-limited countries with heavy malaria burden.
eMethods 1. Participants and Clinical Sample Collection
eMethods 2. Annotation Guideline
eMethods 3. Bounding Box Matching or Algorithm Evaluation
eFigure 1. The F1 Score in Different IoU Thresholds
eFigure 2. The Histogram of Number of Malaria-Infected Cells per Image
eFigure 3. Comparison of Annotations in the Reliability Test Subset
eFigure 4. Comparison of Annotations in the Clinical Validation Set
eFigure 5. Comparison of Detection Results by Algorithm and Individual Microscopists With Reference Standard at Cell Level in the Clinical Validation Set
eFigure 6. Comparison of Detection Results by Algorithm and Individual Microscopists With Reference Standard at Image Level in the Clinical Validation Set
References
- 1.World Health Organization World Malaria Report 2018. Geneva, Switzerland: World Health Organization; 2018:-. [Google Scholar]
- 2.Wang C-M, Hu SC, Hung W-S, et al. The absence of endemic malaria transmission in Taiwan from 2002 to 2010: the implications of sustained malaria elimination in Taiwan. Travel Med Infect Dis. 2012;10(5-6):240-246. doi: 10.1016/j.tmaid.2012.10.005 [DOI] [PubMed] [Google Scholar]
- 3.World Health Organization Bench Aids for the Diagnosis of Malaria Infections. Geneva, Switzerland: World Health Organization; 2000. [Google Scholar]
- 4.Tangpukdee N, Duangdee C, Wilairatana P, Krudsood S. Malaria diagnosis: a brief review. Korean J Parasitol. 2009;47(2):93-102. doi: 10.3347/kjp.2009.47.2.93 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Poostchi M, Silamut K, Maude RJ, Jaeger S, Thoma G. Image analysis and machine learning for detecting malaria. Transl Res. 2018;194:36-55. doi: 10.1016/j.trsl.2017.12.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tiulpin A, Thevenot J, Rahtu E, Lehenkari P, Saarakkala S. Automatic knee osteoarthritis diagnosis from plain radiographs: a deep learning-based approach. Sci Rep. 2018;8(1):1727. doi: 10.1038/s41598-018-20132-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410. doi: 10.1001/jama.2016.17216 [DOI] [PubMed] [Google Scholar]
- 8.De Fauw J, Ledsam JR, Romera-Paredes B, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342-1350. doi: 10.1038/s41591-018-0107-6 [DOI] [PubMed] [Google Scholar]
- 9.Yang F, Poostchi M, Yu H, et al. Deep learning for smartphone-based malaria parasite detection in thick blood smears. IEEE J Biomed Health Inform. 2019;(September):1-1. doi: 10.1109/JBHI.2019.2939121 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dong Y, Jiang Z, Shen H, et al. Evaluations of deep convolutional neural networks for automatic identification of malaria infected cells In: Proceedings of the 2017 IEEE EMBS International Conference on Biomedical & Health Informatics. Piscataway, NJ: Institute of Electrical and Electronics Engineers; 2017:101-104. doi: 10.1109/BHI.2017.7897215 [DOI] [Google Scholar]
- 11.Hung J, Goodman A, Lopes S, et al. Applying faster R-CNN for object detection on malaria images. Cornell University Library. https://arxiv.org/abs/1804.09548. Published March 11, 2019. Accessed December 12, 2019.
- 12.Singh B, Bobogare A, Cox-Singh J, Snounou G, Abdullah MS, Rahman HA. A genus- and species-specific nested polymerase chain reaction malaria detection assay for epidemiologic studies. Am J Trop Med Hyg. 1999;60(4):687-692. doi: 10.4269/ajtmh.1999.60.687 [DOI] [PubMed] [Google Scholar]
- 13.O’Meara WP, McKenzie FE, Magill AJ, et al. Sources of variability in determining malaria parasite density by microscopy. Am J Trop Med Hyg. 2005;73(3):593-598. doi: 10.4269/ajtmh.2005.73.593 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.O’Meara WP, Barcus M, Wongsrichanalai C, et al. Reader technique as a source of variability in determining malaria parasite density by microscopy. Malar J. 2006;5:118. doi: 10.1186/1475-2875-5-118 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Hripcsak G, Rothschild AS. Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc. 2005;12(3):296-298. doi: 10.1197/jamia.M1733 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Lin T-Y, Goyal P, Girshick R, He K, Dollár P Focal loss for dense object detection. Cornell University Library. https://arxiv.org/abs/1708.02002. Published February 7, 2019. Accessed December 12, 2019.
- 17.Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S Feature pyramid networks for object detection. https://arxiv.org/abs/1612.03144. Published April 19, 2019. Accessed December 12, 2019.
- 18.He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition In: Proceedings of the 2017 IEEE EMBS International Conference on Biomedical & Health Informatics. Piscataway, NJ: Institute of Electrical and Electronics Engineers; 2016:770-778. doi: 10.1109/CVPR.2016.90. [DOI] [Google Scholar]
- 19.Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F. ImageNet: a large-scale hierarchical image database In: Proceedings of the 2009 IEEE EMBS International Conference on Biomedical & Health Informatics. Piscataway, NJ: Institute of Electrical and Electronics Engineers; 2009:248-255. doi: 10.1109/CVPR.2009.5206848. [DOI] [Google Scholar]
- 20.Silamut K, White NJ. Relation of the stage of parasite development in the peripheral blood to prognosis in severe falciparum malaria. Trans R Soc Trop Med Hyg. 1993;87(4):436-443. doi: 10.1016/0035-9203(93)90028-O [DOI] [PubMed] [Google Scholar]
- 21.Pedregosa F, Varoquaux G, Machine AGJO. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825-2830. [Google Scholar]
- 22.Youden WJ. Index for rating diagnostic tests. Cancer. 1950;3(1):32-35. doi: [DOI] [PubMed] [Google Scholar]
- 23.Fardy JM. Evaluation of diagnostic tests. Methods Mol Biol. 2009;473:127-136. doi: 10.1007/978-1-59745-385-1_7 [DOI] [PubMed] [Google Scholar]
- 24.Clopper CJ, Pearson ES. The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika. 1934;26(4):404. doi: 10.1093/biomet/26.4.404 [DOI] [Google Scholar]
- 25.Gopakumar GP, Swetha M, Sai Siva G, Sai Subrahmanyam GRK. Convolutional neural network-based malaria diagnosis from focus stack of blood smear images acquired using custom-built slide scanner. J Biophotonics. 2018;11(3):e201700003. doi: 10.1002/jbio.201700003 [DOI] [PubMed] [Google Scholar]
- 26.Dong Y, Jiang Z, Shen H, Pan WD. Classification accuracies of malaria infected cells using deep convolutional neural networks based on decompressed images In: Proceedings of the 2017 IEEE EMBS International Conference on Biomedical & Health Informatics. Piscataway, NJ: Institute of Electrical and Electronics Engineers; 2017:1-6. doi: 10.1109/SECON.2017.7925268 [DOI] [Google Scholar]
- 27.Rajaraman S, Antani SK, Poostchi M, et al. Pre-trained convolutional neural networks as feature extractors toward improved malaria parasite detection in thin blood smear images. PeerJ. 2018;6(1):e4568. doi: 10.7717/peerj.4568 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Zlocha M, Dou Q, Glocker B Improving RetinaNet for CT lesion detection with dense masks from weak RECIST labels. Cornell University Library. https://arxiv.org/abs/1906.02283. Published June 5, 2019. Accessed December 12, 2019.
- 29.Jung H, Kim B, Lee I, et al. Detection of masses in mammograms using a one-stage object detector based on a deep convolutional neural network. PLoS One. 2018;13(9):e0203355. doi: 10.1371/journal.pone.0203355 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Liang Z, Powell A, Ersoy I, et al. CNN-based image analysis for malaria diagnosis In: Proceedings of the 2016 IEEE EMBS International Conference on Biomedical & Health Informatics. Piscataway, NJ: Institute of Electrical and Electronics Engineers; 2016:493-496. doi: 10.1109/BIBM.2016.7822567. [DOI] [Google Scholar]
- 31.Rehman A, Abbas N, Saba T, Mehmood Z, Mahmood T, Ahmed KT. Microscopic malaria parasitemia diagnosis and grading on benchmark datasets. Microsc Res Tech. 2018;81(9):1042-1058. doi: 10.1002/jemt.23071 [DOI] [PubMed] [Google Scholar]
- 32.Momčilović S, Cantacessi C, Arsić-Arsenijević V, Otranto D, Tasić-Otašević S. Rapid diagnosis of parasitic diseases: current scenario and future needs. Clin Microbiol Infect. 2018;25(3):290-309. doi: 10.1016/j.cmi.2018.04.028 [DOI] [PubMed] [Google Scholar]
- 33.Boyce MR, O’Meara WP. Use of malaria RDTs in various health contexts across sub-Saharan Africa: a systematic review. BMC Public Health. 2017;17(1):470. doi: 10.1186/s12889-017-4398-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.World Health Organization Guidelines for the Treatment of Malaria. 3rd ed Geneva, Switzerland: World Health Organization; 2015. [Google Scholar]
- 35.Gaiser H, de Vries M, Williamson A, et al. fizyr/keras-retinanet 0.2. https://github.com/fizyr/keras-retinanet. Accessed December 12, 2019.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
eMethods 1. Participants and Clinical Sample Collection
eMethods 2. Annotation Guideline
eMethods 3. Bounding Box Matching or Algorithm Evaluation
eFigure 1. The F1 Score in Different IoU Thresholds
eFigure 2. The Histogram of Number of Malaria-Infected Cells per Image
eFigure 3. Comparison of Annotations in the Reliability Test Subset
eFigure 4. Comparison of Annotations in the Clinical Validation Set
eFigure 5. Comparison of Detection Results by Algorithm and Individual Microscopists With Reference Standard at Cell Level in the Clinical Validation Set
eFigure 6. Comparison of Detection Results by Algorithm and Individual Microscopists With Reference Standard at Image Level in the Clinical Validation Set