Wang and Avillach [1] developed a convolutional neural network (CNN)–based diagnostic classifier for autism spectrum disorder (ASD). After preprocessing the genomics data from the Simons Simplex Collection (SSC) [2], common variants that may be protective or pathogenic for autism were extracted based on a χ2 test. The authors then designed a CNN-based diagnostic classifier for ASD with an accuracy and area under the receiver operating characteristic curve of 88% and 0.955, respectively. The predictor in Wang and Avillach [1] is currently considered the exemplar in the field, giving much more accurate predictions for autism than other studies [3].
However, when inspecting the code and repeating the analyses, we contend that the method used is flawed and leads to an approximately 30% overestimation of predictive ability.
Wang and Avillach [1] did not provide a GitHub link to the code that was used in their paper. However, code can be found in Dr Wang’s GitHub repository [4] that matches the results and figures in the manuscript.
An error occurred in the data split for training and test sets. The methods state “...the SSC samples were partitioned into two sets based on random sampling of individuals into a training set (80%) and a hold-out test set (20%). There was no overlap of individuals across the two partitions” [1]. However, the code uses different indexing methods for the test and training sets (Multimedia Appendix 1). Because there appears to be no random seed in Wang and Avillach [1], we cannot reproduce the exact overlap that was mentioned in the manuscript. However, simulations (N=100) using the code identified an average of 80% (SD 1%) of the test dataset being represented in the training dataset.
We corrected this error in the code [5] and generated new models using the 100 features that were identified in Wang and Avillach [1] and the genomics data from the SSC [2]. Simulations (N=100) with these models result in an area under the receiver operating characteristic curve of 0.61 (SD 0.02) and an accuracy of 60% (SD 2%; Multimedia Appendix 1). This is 0.34 and 28% lower than the reported metrics, respectively, in Wang and Avillach [1].
The accuracy of the CNN-based diagnostic classifier for ASD presented in Wang and Avillach [1] is overestimated by ~28%. We contend that Wang and Avillach [1] should be retracted according to the Committee on Publication Ethics (COPE) guidelines.
Supplementary material
Acknowledgments
We acknowledge Simons Simplex Collection project number 15286.1.1.
Abbreviations
- ASD
autism spectrum disorder
- CNN
convolutional neural network
- COPE
Committee on Publication Ethics
- SSC
Simons Simplex Collection
Footnotes
Data Availability: The code used for Wang and Avillach [1] is available on GitHub [4]. The clone of the code and corrected code are also available on GitHub [5]. Simons Simplex Collection is accessible at [6].
Editorial Notice: The corresponding author of “Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning” did not submit a reply to this letter.
Conflicts of Interest: None declared.
References
- 1.Wang H, Avillach P. Diagnostic classification and prognostic prediction using common genetic variants in autism spectrum disorder: genotype-based deep learning. JMIR Med Inform. 2021 Apr 7;9(4):e24754. doi: 10.2196/76833. Retracted in. JMIR Med Inform 2025;13:e76833. doi. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
- 2.Fischbach GD, Lord C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron. 2010 Oct 21;68(2):192–195. doi: 10.1016/j.neuron.2010.10.006. doi. Medline. [DOI] [PubMed] [Google Scholar]
- 3.Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z. doi. Medline. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wang H. Hms-dbmi/haishuai. GitHub. 2020. [12-09-2024]. https://github.com/hms-dbmi/Haishuai URL. Accessed.
- 5.Miller C. Catriona-miller/sfari_paper_clone. GitHub. 2024. [12-09-2024]. https://github.com/Catriona-Miller/SFARI_paper_clone URL. Accessed.
- 6.Simons Simplex Collection. Simons Foundation Autism Research Initiative. [24-04-2025]. https://www.sfari.org/resource/simons-simplex-collection/ URL. Accessed.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
