Code Error in “Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning”

Catriona Miller; Theo Portlock; Denis M Nyaga; Greg D Gamble; Justin M O'Sullivan

doi:10.2196/66556

letter

. 2025 May 6;13:e66556. doi: 10.2196/66556

Code Error in “Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning”

Catriona Miller ¹, Theo Portlock ¹, Denis M Nyaga ¹, Greg D Gamble ¹, Justin M O'Sullivan ^1,^✉

Editor: JMIR Editorial Office

PMCID: PMC12138136 PMID: 40327366

Wang and Avillach [1] developed a convolutional neural network (CNN)–based diagnostic classifier for autism spectrum disorder (ASD). After preprocessing the genomics data from the Simons Simplex Collection (SSC) [2], common variants that may be protective or pathogenic for autism were extracted based on a χ² test. The authors then designed a CNN-based diagnostic classifier for ASD with an accuracy and area under the receiver operating characteristic curve of 88% and 0.955, respectively. The predictor in Wang and Avillach [1] is currently considered the exemplar in the field, giving much more accurate predictions for autism than other studies [3].

However, when inspecting the code and repeating the analyses, we contend that the method used is flawed and leads to an approximately 30% overestimation of predictive ability.

Wang and Avillach [1] did not provide a GitHub link to the code that was used in their paper. However, code can be found in Dr Wang’s GitHub repository [4] that matches the results and figures in the manuscript.

An error occurred in the data split for training and test sets. The methods state “...the SSC samples were partitioned into two sets based on random sampling of individuals into a training set (80%) and a hold-out test set (20%). There was no overlap of individuals across the two partitions” [1]. However, the code uses different indexing methods for the test and training sets (Multimedia Appendix 1). Because there appears to be no random seed in Wang and Avillach [1], we cannot reproduce the exact overlap that was mentioned in the manuscript. However, simulations (N=100) using the code identified an average of 80% (SD 1%) of the test dataset being represented in the training dataset.

We corrected this error in the code [5] and generated new models using the 100 features that were identified in Wang and Avillach [1] and the genomics data from the SSC [2]. Simulations (N=100) with these models result in an area under the receiver operating characteristic curve of 0.61 (SD 0.02) and an accuracy of 60% (SD 2%; Multimedia Appendix 1). This is 0.34 and 28% lower than the reported metrics, respectively, in Wang and Avillach [1].

The accuracy of the CNN-based diagnostic classifier for ASD presented in Wang and Avillach [1] is overestimated by ~28%. We contend that Wang and Avillach [1] should be retracted according to the Committee on Publication Ethics (COPE) guidelines.

Supplementary material

Multimedia Appendix 1. Walkthrough of the steps that were taken and the results of our attempt to reproduce the work of Wang and Avillach (2021).

medinform-v13-e66556-s001.pptx^{(408KB, pptx)}

DOI: 10.2196/66556

Acknowledgments

We acknowledge Simons Simplex Collection project number 15286.1.1.

Abbreviations

ASD: autism spectrum disorder
CNN: convolutional neural network
COPE: Committee on Publication Ethics
SSC: Simons Simplex Collection

Footnotes

Data Availability: The code used for Wang and Avillach [1] is available on GitHub [4]. The clone of the code and corrected code are also available on GitHub [5]. Simons Simplex Collection is accessible at [6].

Editorial Notice: The corresponding author of “Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning” did not submit a reply to this letter.

Conflicts of Interest: None declared.

References

1.Wang H, Avillach P. Diagnostic classification and prognostic prediction using common genetic variants in autism spectrum disorder: genotype-based deep learning. JMIR Med Inform. 2021 Apr 7;9(4):e24754. doi: 10.2196/76833. Retracted in. JMIR Med Inform 2025;13:e76833. doi. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
2.Fischbach GD, Lord C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron. 2010 Oct 21;68(2):192–195. doi: 10.1016/j.neuron.2010.10.006. doi. Medline. [DOI] [PubMed] [Google Scholar]
3.Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z. doi. Medline. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Wang H. Hms-dbmi/haishuai. GitHub. 2020. [12-09-2024]. https://github.com/hms-dbmi/Haishuai URL. Accessed.
5.Miller C. Catriona-miller/sfari_paper_clone. GitHub. 2024. [12-09-2024]. https://github.com/Catriona-Miller/SFARI_paper_clone URL. Accessed.
6.Simons Simplex Collection. Simons Foundation Autism Research Initiative. [24-04-2025]. https://www.sfari.org/resource/simons-simplex-collection/ URL. Accessed.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia Appendix 1. Walkthrough of the steps that were taken and the results of our attempt to reproduce the work of Wang and Avillach (2021).

medinform-v13-e66556-s001.pptx^{(408KB, pptx)}

DOI: 10.2196/66556

[R1] 1.Wang H, Avillach P. Diagnostic classification and prognostic prediction using common genetic variants in autism spectrum disorder: genotype-based deep learning. JMIR Med Inform. 2021 Apr 7;9(4):e24754. doi: 10.2196/76833. Retracted in. JMIR Med Inform 2025;13:e76833. doi. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[R2] 2.Fischbach GD, Lord C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron. 2010 Oct 21;68(2):192–195. doi: 10.1016/j.neuron.2010.10.006. doi. Medline. [DOI] [PubMed] [Google Scholar]

[R3] 3.Alowais SA, Alghamdi SS, Alsuhebany N, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023 Sep 22;23(1):689. doi: 10.1186/s12909-023-04698-z. doi. Medline. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Wang H. Hms-dbmi/haishuai. GitHub. 2020. [12-09-2024]. https://github.com/hms-dbmi/Haishuai URL. Accessed.

[R5] 5.Miller C. Catriona-miller/sfari_paper_clone. GitHub. 2024. [12-09-2024]. https://github.com/Catriona-Miller/SFARI_paper_clone URL. Accessed.

[R6] 6.Simons Simplex Collection. Simons Foundation Autism Research Initiative. [24-04-2025]. https://www.sfari.org/resource/simons-simplex-collection/ URL. Accessed.

PERMALINK

Code Error in “Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning”

Catriona Miller, BSc

Theo Portlock, PhD

Denis M Nyaga, PhD

Greg D Gamble, PhD

Justin M O'Sullivan, PhD

Supplementary material

Acknowledgments

Abbreviations

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Code Error in “Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning”

Catriona Miller, BSc

Theo Portlock, PhD

Denis M Nyaga, PhD

Greg D Gamble, PhD

Justin M O'Sullivan, PhD

Supplementary material

Acknowledgments

Abbreviations

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases