Comparative analysis of diagnostic imaging models built with automated machine learning

Arun Thirunavukarasu; Kabilan Elangovan; Laura Gutierrez; Daniel Ting

doi:10.7861/fhj.10-3-s21

. 2023 Nov;10(Suppl 3):21–23. doi: 10.7861/fhj.10-3-s21

Comparative analysis of diagnostic imaging models built with automated machine learning

Arun Thirunavukarasu ^A, Kabilan Elangovan ^A, Laura Gutierrez ^A, Daniel Ting ^A

PMCID: PMC10884653 PMID: 38406690

Automated machine learning (autoML) allows clinicians and researchers to access deep learning (DL) for medical image analysis without otherwise required computational expertise and access to expensive hardware.¹ Many autoML platforms have been developed in industry and academia, with variable strengths and limitations.² Our previous work shows that autoML has promise in clinical research, but comparative studies pitting autoML platforms against one another are lacking.³ Here, we compared Amazon Rekognition – the highest performing autoML platform for image analysis according to our systematic review³ – with H2O.ai Driverless AI, an emerging platform with promising results outside medicine.

We trialled the autoML platforms in four clinical contexts, using publicly available datasets: classifying pneumonia and normal chest X-ray (CXR) images; Alzheimer's disease, mild cognitive impairment, and normal brain magnetic resonance images (MRI); glaucoma and healthy fundus photographs; and malignant and benign pigmented lesions in dermoscopic photographs. Training dataset sizes were 4,684, 5,120, 563, and 2,637 images respectively; external validation dataset sizes were 1,172, 1,280, 142, and 660 images respectively. Our hardware had no bearing on the results, as both platforms conducted analysis and model-building in the cloud, using provider computational resources. Platforms were compared in terms of their technical features (Table 1), and performance in each clinical task was gauged with F1-score, calculated as the harmonic mean of precision and recall in an external validation dataset.

Table 1.

Comparison of platforms' technical features

	AutoML platform	H2O.ai Driverless AI	Amazon Rekognition
	Cost	Unclear (free R/Python libraries)	$4/hr inference; $1/hr training (free trial)
Accessibility	Code requirement	None (optional)	None (optional)
	Computing locus	Local/cloud	Cloud
	Data format	Structured/unstructured	Unstructured
	Feature extraction/selection	Yes	Yes
Technical Features	Model selection/training	Yes	Yes
	Hyperparameter optimisation	Yes	Yes
	Evaluation	Yes	Yes
	Explainability analysis	Yes	No
Portability	Exportability	Yes	No
	Interpretability	Yes	No

Open in a new tab

Although autoML has already begun to be adopted as a research tool in clinical research, validation is essential to demonstrate that approaches are accurate, reliable and fair. Most studies cited as evidence of validation fail to compare autoML platforms with alternative modalities; Rekognition and Driverless AI have not been trialled in the same clinical task.³ Here, platform performances were similar, with Driverless AI exhibiting superiority in CXR (F1_H2O = 0.985, F1_Rekognition = 0.976) and brain MRI (F1_H2O = 0.991; F1_Rekognition = 0.982); Rekognition exhibiting superiority in dermoscopic photograph (F1_H2O = 0.910, F1_Rekognition = 0.915); and platforms exhibiting equivalent discriminative ability in fundus photograph (F1_H2O = 1, F1_Rekognition = 1). Performance compares well with bespoke computational models and clinical experts.^4,5 Driverless AI facilitated model export for local deployment without requiring image upload to the cloud, whereas Rekognition did not. Driverless AI also provided explainability analysis which illustrated salient features in the DL algorithms (Fig 1). While algorithms tended to focus on clinically relevant regions of images in each modality, the dermoscopic photograph classifier often made decisions based on the peri-lesion region.

Fig 1. — Driverless AI providing explainability analysis, which illustrated salient features in the DL algorithms.

The performance of diagnostic autoML models is promising, and autoML has great potential to democratise DL.⁶ Limitations of autoML include concerns regarding ‘black box’ algorithms, data security with cloud-based models, and lack of customisability. To be applied clinically, autoML must facilitate model export for use with sensitive data, comparison with conventional models and explainability analysis. Alternative uses of autoML include education – allowing novices to gain ‘hands-on’ experience sooner; and pilot studies – allowing clinicians to trial DL themselves before approaching computational researchers and applying for research grants with convincing preliminary results. Clinicians andand researchers should consider their requirements, capabilities, and the technical features of platforms when deciding how to apply autoML.

References

1.Waring J, Lindvall C, Umeton R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artif Intell Med 2020; 104; 101822. [DOI] [PubMed] [Google Scholar]
2.Korot E, et al. Code-free deep learning for multi-modality medical image classification. Nat Mach Intell 2021;3:288–98. [Google Scholar]
3.Thirunavukarasu AJ, et al. Clinical applications of automated machine learning: a systematic review. [unpublished].
4.Aggarwal R, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit Med 2021;4:65. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Nagendran M, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020;368:m689. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Korot E, et al. Clinician-driven artificial intelligence in ophthalmology: resources enabling democratization. Curr Opin Ophthalmol 2021;32:445–51. [DOI] [PubMed] [Google Scholar]

[CIT0001] 1.Waring J, Lindvall C, Umeton R. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare. Artif Intell Med 2020; 104; 101822. [DOI] [PubMed] [Google Scholar]

[CIT0002] 2.Korot E, et al. Code-free deep learning for multi-modality medical image classification. Nat Mach Intell 2021;3:288–98. [Google Scholar]

[CIT0003] 3.Thirunavukarasu AJ, et al. Clinical applications of automated machine learning: a systematic review. [unpublished].

[CIT0004] 4.Aggarwal R, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit Med 2021;4:65. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0005] 5.Nagendran M, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020;368:m689. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0006] 6.Korot E, et al. Clinician-driven artificial intelligence in ophthalmology: resources enabling democratization. Curr Opin Ophthalmol 2021;32:445–51. [DOI] [PubMed] [Google Scholar]

PERMALINK

Comparative analysis of diagnostic imaging models built with automated machine learning

Arun Thirunavukarasu

Kabilan Elangovan

Laura Gutierrez

Daniel Ting

Table 1.

Fig 1.

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Comparative analysis of diagnostic imaging models built with automated machine learning

Arun Thirunavukarasu

Kabilan Elangovan

Laura Gutierrez

Daniel Ting

Table 1.

Fig 1.

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases