We are pleased to response to the concerns regarding the study design of using a randomized controlled trial (RCT) to identify the efficacy of CC-Cruiser in clinical practices [1].
We appreciate that Qian Zhou and colleagues regarded the paper as “one of the first published RCTs comparing the diagnostic efficacy of artificial intelligence (AI) against experts”. However, we would like to make some clarifications concerning the subject of our paper: to explore the real-world performance, including the utility and acceptability, of AI diagnosis with using unfiltered clinical data in the current stage. After conducting the test of significance, we admitted that AI is inferior to doctors. Qian Zhou and colleagues had 2 main concerns: 1) a non-inferior design is more appropriate to confirm the efficacy of artificial intelligence (AI); 2) a single arm diagnostic accuracy testing trial design can avoid the trial effect in the senior consultants group. We would like to have a point-to-point response. First, we agree that a non-inferior design can effectively assess AI performance in clinical practices in some cases. Nevertheless, a design is appropriate only if it is applicable to a specific study. Our study is not a phase 3 clinical trial as Qian Zhou and colleagues thought, where a superiority, equivalence, or non-inferiority design is needed. We claimed that AI was inferior to doctors but can assist doctors as a screening tool due to the acceptable diagnosis accuracy, sensitivity, and specificity in clinical setting, instead of conducting the diagnosis. Moreover, a non-inferior design with acceptable non-inferiority margin of 5% of diagnostic accuracy is hard to set in our study due to the expected differences of at least (not at most non-inferiority designs require) 5% between two groups and lack of similar reference for the non-inferiority margin from previous studies [2]. Therefore, the non-inferior design is improper for our study. Second, the experts providing golden standard diagnosis with masking to the group assignments mentioned in Zhou's letter are not the testing clinicians in senior consultants group in our study. Indeed, the trial effect of testing clinicians may not be avoided in this RCT. However, the center effect (confirmed in our analysis) in this multicenter trial may also influence the accuracy results. In addition, a single arm design cannot assess the clinical differences between the diagnostic settings of medical AI and traditional ophthalmic clinics in real-world [3]. We were also interested in comparing the mean time for receiving a diagnosis and level of patient satisfaction as the important metrics of diagnostic efficacy of AI tools with using a two-arm trial.
In conclusion, this diagnostic RCT is a more suitable choice for this trial and can be regarded as the final frontier to evaluate the clinical difference between the AI diagnostic procedures using CC-Cruiser and traditional eye clinics [4]. We also believe that the guidelines of appropriate study design of AI applications are needed.
CRediT authorship contribution statement
Ruiyang Li: Writing - review & editing. Lanqin Zhao: Writing - review & editing. Dongyuan Yun: Writing - review & editing. Haotian Lin: Writing - review & editing.
Declaration of Competing Interest
The authors declare no competing financial interests.
Acknowledgments
We are thankful towards the authors Miss Qian Zhou and colleagues for the comments on our study and improvement of AI applications.
References
- 1.Lin H., Li R., Liu Z. Diagnostic efficacy and therapeutic decision-making capacity of an artificial intelligence platform for childhood cataracts in eye clinics: a multicentre randomized controlled trial. EClinicalMedicine. 2019;9:52–59. doi: 10.1016/j.eclinm.2019.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Hou Y., Wu X.Y., Li K. Issues on the selection of non-inferiority margin in clinical trials. Chin Med J (Engl) 2009;122(4):466–470. [PubMed] [Google Scholar]
- 3.Sambucini V. Comparison of single-arm vs. randomized phase II clinical trials: a Bayesian approach. J Biopharm Stat. 2015;25(3):474–489. doi: 10.1080/10543406.2014.920856. [DOI] [PubMed] [Google Scholar]
- 4.Rodger M., Ramsay T., Fergusson D. Diagnostic randomized controlled trials: the final frontier. Trials. 2012;13(1):137. doi: 10.1186/1745-6215-13-137. [DOI] [PMC free article] [PubMed] [Google Scholar]