Skip to main content
Health Information Science and Systems logoLink to Health Information Science and Systems
. 2021 Apr 6;9(1):17. doi: 10.1007/s13755-021-00145-9

Detecting autism spectrum disorder using machine learning techniques

An experimental analysis on toddler, child, adolescent and adult datasets

Md Delowar Hossain 1, Muhammad Ashad Kabir 1,, Adnan Anwar 2, Md Zahidul Islam 1
PMCID: PMC8024224  PMID: 33898020

Abstract

Autism Spectrum Disorder (ASD), which is a neuro development disorder, is often accompanied by sensory issues such an over sensitivity or under sensitivity to sounds and smells or touch. Although its main cause is genetics in nature, early detection and treatment can help to improve the conditions. In recent years, machine learning based intelligent diagnosis has been evolved to complement the traditional clinical methods which can be time consuming and expensive. The focus of this paper is to find out the most significant traits and automate the diagnosis process using available classification techniques for improved diagnosis purpose. We have analyzed ASD datasets of toddler, child, adolescent and adult. We have evaluated state-of-the-art classification and feature selection techniques to determine the best performing classifier and feature set, respectively, for these four ASD datasets. Our experimental results show that multilayer perceptron (MLP) classifier outperforms among all other benchmark classification techniques and achieves 100% accuracy with minimal number of attributes for toddler, child, adolescent and adult datasets. We also identify that ‘relief F’ feature selection technique works best for all four ASD datasets to rank the most significant attributes.

Keywords: Autism spectrum disorder, machine learning, feature selection, classification, ASD detection

Introduction

Autism spectrum disorder (ASD), is a neurological developmental disorder. It affects how people communicate and interact with others, as well as how they behave and learn [1]. The symptoms and signs appear when a child is very young. It is a lifelong condition and cannot be completely cured. A study found that 33% of children with difficulties other than ASD have some ASD symptoms while not meeting the full classification criteria [2].

ASD has a significant economic impact both due to the increase in the number of ASD cases worldwide, and the time and costs involved in diagnosing a patient. Early detection of ASD can help both patient and healthcare service providers by prescribing proper therapy and/or medication needed and thereby reducing the long-term costs associated with delayed diagnosis. On the other hand the traditional clinical methods such as Autism Diagnostic Interview Revised (ADI-R) and Autism Diagnostic Observation Schedule Revised (ADOS-R), are time consuming and cumbersome [3, 4]. The child who are too young and has delayed speech issue roughly score 25% of the total ADI-R items because the verbal sections cannot be answered accurately for the patient. Besides, conducting interview with a caregiver by a trained examiner takes 90 to 150 min which is cumbersome and often misses data. On the other hand, the detection of ASD by ADOS-R depends on measurements of the scoring based on the answers provided. Moreover, one of the major disadvantages of this approach is the tendency to over classify children who have other clinical disorders [5]. So, the healthcare professionals are in urgent need of a time efficient, easy and accessible ASD screening method that can accurately detect whether a patient with a certain measured characteristic has ASD and can inform individuals whether they should pursue a formal clinical diagnosis. Presently, the available datasets are few and associated with clinical diagnosis which is mostly genetic in nature, e.g., AGRE [6], National Database of Autism Research (NDAR) [7] and Boston Autism Consortium (AC) [8].

Now a days, machine learning has been applied to detect various diseases including depression [9] and ASD [7, 10, 11]. The primary objectives of applying machine learning techniques are to improve diagnosis accuracy and reduce diagnosis time of a case in order to provide quicker access to health care services. Since the diagnosis process of a case involves coming up with the right class (ASD, No-ASD) based on the input case features, this process can be attributed as a classification task in machine learning. In this paper, we apply various classification techniques to obtain improved accuracy on the results of detecting ASD cases for all four datasets. The main contributions of this paper can be summarised as follows:

  1. We analyze the attributes of Toddler, Child, Adolescent and Adult ASD datasets, and identify associations between the demographic information and ASD cases.

  2. We explore benchmark feature selection methods and identify the method that performs best for all four ASD datasets to select the set of most significant features to achieve highest classification accuracy. Our analysis shows that appropriate feature selection can significantly improve the ASD classification performance.

  3. We compare state-of-the-art classification techniques and identify the best performing classifier for all four ASD datasets.

The rest of the paper is organized as follows. “Related work” section reviews related work. In “Methodology” section, we present our methodology. Description of the datasets, preprocessing and exploratory analysis are presented in “Data preprocessing and analysis” section. The performance comparison of benchmark classification techniques is presented in “Applying classification techniques” section. “Feature engineering” section reports feature engineering results. After a thorough comparison of our approach with existing approaches in “Classification results comparison” section, we conclude the paper in “Conclusion” section.

Related work

A number of research have adopted machine learning techniques to improve the diagnosis process of ASD [7, 10, 11]. The primary motivation of using machine learning models for ASD is to reduce the early detection time that enables quicker access to health care services and improves diagnostic accuracy [12]. We can categorize the ASD detection study in two groups—Video clip-based study and AQ-based study.

Video clip-based study

Parikh et al. [13] have hypothesized that the use of machine learning analysis on home video can speed up the diagnosis time without compromising accuracy. Authors have analyzed item-level records from two standard diagnostic instruments to construct machine learning classifiers optimized for sparsity, interpretability and accuracy. Authors have considered eight machine learning models to apply on 162 two-minutes home videos of children with and without ASD. Besides, a mobile web portal has been created for video raters to assess 30 behavioral features (e.g., eye contact, social smile, etc.) that are used by eight different machine learning models for detecting ASD. The result shows that 94% accuracy is achieved for each case using cross-validation testing and subsequent independent validation from previous work. However, this method is also time consuming since the video needs to be recorded and assessed for the rating based on 30 questions. Whereas we adopt a method that only uses mobile app [14] from where users can easily select the appropriate answers of the ten ASD questions. Besides, improved analysis based on reduced number of attributes can significantly improve the performance of the ASD detection.

Zunino et al. [15] analyzed video gesture for detecting ASD. Authors have devised an experimental setup by recording video of patient and healthy children performing simple gesture of grasping a bottle. By only processing the video clips depicting the grasping action using a recurrent deep neural network, they are able to classify ASD and non-ASD cases with a good accuracy. In that work, authors followed a common procedure where each video is split into 15-frame clips and passed through the entire model that outputs a binary vector containing, for each frame, the probability of ASD and No-ASD. Each clip is considered independent during the training. The test accuracy for each subject is computed by averaging the scored probabilities over all the frames of a video. But the model decreases its effectiveness after a threshold of 0.9. The results support the hypothesis that feature tagging of home videos using machine learning classification of autism can yield accurate outcomes in short time frames.

AQ-based study

Autistic-spectrum Quotient (AQ) is a screening method developed by Baron-Cohen et al. [16]. Later Allison et al. [10] proposes the AQ 10-adult and AQ 10-child, shortened versions of the original AQ. This questionnaire based attempt is said to increase the efficiency of ASD screening.

There are two versions of the available ASD datasets—version-1 (v1) [17] that has 20 attributes and version-2 (v2) [18] that has 23 attributes and more records than version-1. Version-1 dataset for Toddler was not available. Almost all of the previous research works were based on the version-1 ASD dataset. The Autism questionnaires (AQ1 to AQ10) remain same for the both versions.

Basu [19] has analyzed the version-1 Adult ASD dataset using supervised machine learning techniques such as decision tree, random forest, support vector machines (SVM), k-nearest neighbor (kNN), Naive Bayes, logistic regression, linear discriminant analysis (LDA) and multilayer perceptron (MLP). Author concludes that SVM classification technique scores the highest accuracy among all other benchmark techniques for v1 adult ASD dataset. Later McNamara et al. [20] also classify the same dataset by applying Decision Tree and random forest classifiers considering improved data pre-processing where authors remove least significant attributes and records with missing values before applying the classifiers. The comparison results between these two classifiers show that the random forest results more accuracy for version-1 adult ASD dataset. Hossain and Kabir [21] conducted the research on v1 child ASD dataset following the same methodology and applied 27 benchmark classifiers. They have also found that the sequential minimal optimization (SMO) classifier performs best on detecting child ASD cases. Beside this, they also identified the dominant features in detecting child autism. Raj and Masood [22], Baranwal and Vanitha [23] and Erkan and Thanh [24] used v1 adult, adolescent and child ASD dataset and applied machine learning techniques to detect ASD. They have used all 20 attributes in classification.

To the best of our knowledge, only Thabtah and Peebles [25] used v2 three ASD datasets (Child, Adolescent and Adult) and have applied a rule-based machine learning technique for classification. In contrast, we have used all 4 v2 ASD datasets (Toddler, Child, Adolescent and Adult) and applied 27 benchmark classification techniques to identify the best classifier. To enhance classification accuracy, we further apply feature engineering approach to identify minimal set of significant features/attributes and use them in classification (instead of using all attributes).

Methodology

Figure 1 depicts an overview of our methodology for detecting ASD cases, briefly described below.

Fig. 1.

Fig. 1

Methodology for detecting autism

  • Step 1: Data preprocessing and analysis. In this step, we perform a detailed data-preprocessing before going into detailed analysis and classification. The ASD datasets [18] contain few records with missing values. Also there are some attributes which represent meta-information (e.g., used app before, who completed the test) and not related to ASD. Thus, the datasets need to be cleaned/preprocessed before applying classification. “Data preprocessing and analysis” section presents details of our data preprocessing and analysis including dataset description, data cleansing/preprocessing, and statistical analysis of association between attributes.

  • Step 2: Apply benchmark classification techniques. It this step, we apply various classification techniques. In this study, we have applied 27 classification techniques to all 4 ASD datasets and evaluated their performance using accuracy and F-measure through 10-fold cross validation. Finally, we select top eight classifiers for further evaluation. “Applying classification techniques” section reports the results.

  • Step 3: Feature engineering. Classification accuracy may degrade if all attributes in a dataset are used (for instance, see Table 8 results for adolescent dataset). Furthermore, less number of attributes reduces the required resources (time and memory) for training a model. Thus, in this step, we rank attributes/features to identify the optimal set of most significant attributes that results highest accuracy. We apply five benchmark feature ranking techniques to compare and identify the technique that ranked attributes consistently across all four ASD datasets and can be used to identify the optimal set of most significant attributes. Finally, we identify the best classifier among the top eight classifiers (selected in the previous step) that scores highest accuracy for the optimal set of attributes. “Feature engineering” section describes this step in details with our findings.

  • Step 4: Classification results comparison. ASD positive result from the classification indicates that the patient needs to undergo further medical diagnosis and necessary treatments. Therefore, the classification accuracy is important to reduce the false positive and overall improvement of the outcome. “Classification results comparison” section presents a comparative analysis of our work with state-of-the-art results. The results show that multilayer perceptron (MLP) classifier with ‘relief F’ feature engineering achieves 100% accuracy using top ten attributes for all four ASD datasets, and outperforms state-of-the-art results.

Table 8.

Classifiers’ accuracy (%) with increasing number of attributes according to their ranks (from Relief F) for adolescent ASD dataset

Attribute set Size MLP SMO LR SL LB ICO RAB LMT
{A6} 1 79.44 79.44 79.44 79.44 79.44 79.44 79.44 79.44
{A6, A3} 2 81.85 81.85 81.85 81.85 81.85 81.85 81.86 81.86
{A6, A3, A5} 3 83.06 80.65 80.65 80.65 80.65 81.05 82.26 80.65
{A6,...,A4} 4 83.87 85.08 84.27 85.89 84.27 84.68 84.68 85.89
{A6,...,A7} 5 84.68 87.50 87.90 87.50 87.50 87.50 87.5 87.5
{A6,...,A9} 6 85.89 85.89 87.10 89.11 87.50 88.31 90.33 89.12
{A6,...,A10} 7 88.71 89.11 92.34 91.53 91.13 91.13 91.13 92.34
{A6,...,A2} 8 91.94 91.13 91.53 90.32 91.53 90.32 89.52 89.92
{A6,...,A1} 9 92.74 89.92 93.55 90.32 91.13 90.32 89.52 90.33
{A6,...,A8} 10 100 99.60 100 97.98 97.58 98.39 97.99 97.99
{A6,...,Ethn} 11 100 99.19 98.39 99.19 95.56 95.97 97.99 99.2
{A6,...,Res} 12 99.19 97.98 94.35 96.37 97.18 97.18 97.18 96.38
{A6,...,Age} 13 99.19 97.98 95.16 96.77 97.18 97.18 97.18 96.78
{A6,...,FASD} 14 99.19 97.98 95.56 96.37 97.18 97.18 97.18 96.38
{A6,...,Sex} 15 98.79 97.58 95.56 95.56 97.18 97.18 97.18 95.57
{A6,...,Jand} 16 98.79 97.18 94.76 95.97 97.18 97.18 97.18 95.97

Data preprocessing and analysis

Dataset description

Our used ASD datasets (version 2) [18] mainly comprise 23 attributes (except the toddler dataset which has 18 attributes). The attribute descriptions are given in Table 1. All datasets have ten binary attributes representing the answers of screening questions (A1 to A10) as well as the categorical variables such as gender, ethnicity, jaundice, family_ASD, residence and ASD class. Thee datasets also have two numeric variables such as age and screen score/results. We found that five attributes were absent in toddler ASD dataset: who completed the test (user), why taken the screening, used_app_before, country of residence and language spoken.

Table 1.

ASD dataset description

Attribute Type Description
Age Number Age in months/years
Gender String Male or Female
Ethnicity String List of common ethnicities in text format
Born with jaundice Boolean (yes or no) Whether the case was born with jaundice
Family member with PDD Boolean (yes or no) Whether any immediate family member has a PDD
Who is completing the test (User) String Parent, self, caregiver, medical staff, clinician,etc.
Why taken the screening Meta The person can write short reason for completing the task
Used_App_Before Boolean (yes or no) This answer would be binary
Language spoken String The user will give his/her native language information
Country of residence String List of countries in text format
Used the screening app before Boolean (yes or no) Whether the user has used a screening app
Screening Method Type Integer (0, 1, 2, 3) The type of Screening Methods choses based on age category (0 = toddler, 1 = Child, 2 = Adolescent, 3 = Adult)
Question 1 (A1) Binary (0, 1) S/he often notices small sounds when others do not, (Child, Adolescent) S/he notices patterns in things all the time, (Adult) Does your child look at you when you call his/her name? (Toddler)
Question 2 (A2) Binary (0, 1) S/he usually concentrates more on the whole picture, rather than the small detail, (child, Adolescent, Adults) How easy is it for you to get eye contact with your child? (Toddler)
Question 3 (A3) Binary (0, 1) In a social group, s/he can easily keep track of several different peoples conversations, (child, Adolescent) I find it easy to do more than one thing at once, (Adult) Does your child point to indicate that s/he wants something? (e.g. a toy that is out of reach) (Toddler)
Question 4 (A4) Binary (0, 1) S/he finds it easy to go back and forth between different activities, (child, Adolescent) If there is an interruption, s/he can switch back to what s/he was doing very quick, (Adult) Does your child point to share interest with you? (e.g. pointing at an interesting sight) (Toddler)
Question 5 (A5) Binary (0, 1) S/he doesnt know how to keep a conversation going with his/her peers, (child, Adolescent) I find it easy to read between the lines when someone is talking to me, (Adult) Does your child pretend? (e.g. care for dolls, talk on a toy phone) (Toddler)
Question 6 (A6) Binary (0, 1) S/he is good at social chit-chat, (child, Adolescent) I know how to tell if someone listening to me is getting bored, (Adult) Does your child follow where youre looking? (Toddler)
Question 7 (A7) Binary (0, 1) When s/he is read a story, s/he finds it difficult to work out the characters intentions or feelings, (Child) When s/he was younger, s/he used to enjoy playing games involving pretending with other children, (Adolescent) When Im reading a story, I find it difficult to work out the characters intentions, (Adult) If you or someone else in the family is visibly upset, does your child show signs of wanting to comfort them? (e.g. stroking hair, hugging them (Toddler)
Question 8 (A8) Binary (0, 1) When s/he was in preschool, s/he used to enjoy playing games involving pretending with other children, (Child) S/he finds it difficult to imagine what it would be like to be someone else, (Adolescent) I like to collect information about categories of things (e.g. types of car, types of bird, types of train, types of plant, etc.), (Adult) Would you describe your childs first words as: (Toddler)
Question 9 (A9) Binary (0, 1) S/he finds it easy to work out what someone is thinking or feeling just by looking at their face, (Child) S/he finds social situations easy, (Adolescent) I find it easy to work out what someone is thinking or feeling just by looking at their face, (Adult) Does your child use simple gestures? (e.g. wave goodbye) (Toddler)
Question 10 (A10) Binary (0, 1) S/he finds it hard to make new friends, (Child, Adolescent) I find it difficult to work out peoples intentions, (Adult) Does your child stare at nothing with no apparent purpose? (Toddler)
Screening Score Integer (0 to 10) The final score obtained based on the scoring algorithm of the screening method used. This is computed in an automated manner
Class Binary (0, 1) Subject was diagnosed with ASD or not: 1 - ASD, 0 - Not ASD

Child and adolescent datasets have same screening questions (A1 to A10) whereas toddler and adult have some unique questions. We have furnished all the questionnaires of ASD dataset according to the sequence for child, adolescent, adult and toddler (please see the Description column for details) in Table 1. The class value is assigned during the process of data collection based on the answers of AQ-10 (A1 to A10) questions. The class value “No” is assigned when the final score of AQ-10 methods scores less than or equal to 7. Otherwise, it is assigned “Yes” which indicates that the individual does have ASD. However, for the toddler dataset the cut-off score is less than or equal to 4. So, in this case if the total score 4, it is considered that the subject has ASD.

Table 2 reports number of ASD and non-ASD cases in the datasets. Figure 2a shows gender-wise class distribution of the considered four ASD datasets. Here, we observe that the child and adolescent datasets are balanced but toddler and adult datasets are not balanced considering the total number of ASD cases and/or gender distribution.

Table 2.

Number of ASD and non-ASD cases in the datasets (version 2)

Dataset ASD Non-ASD
Toddler 728 326
Child 257 252
Adolescent 127 122
Adult 358 760

Fig. 2.

Fig. 2

Analysis of toddler, child, adolescent and adult ASD datasets

Data preprocessing

In order to simplify our model and to improve classification accuracy, we clean up datasets by removing the instances with missing values. Afterwards, we pre-process the datasets by reducing the attributes those are meta information and are not associated with ASD, listed below:

  • Case

  • Used App Before

  • User (who completed the screening)

  • Language

  • Why taken the screening

  • Age Description

  • Screening Type

  • Score

We observed that the ‘score’ attribute value 7 or higher classified as ASD_Class = YES for child, adolescent and adult ASD datasets, and the score value 4 or higher classified as ASD_Class = YES for toddler dataset. So, including this attribute in classification implies that the classification algorithm already has the outcome of the target variable. For this reason this attribute is removed during analysis. Finally, we select 16 attributes for child, adolescent and adult datasets and 15 attributes (“residence attribute was absent”) for toddler dataset.

In the next subsections, we investigate the impact/ association of jaundice, family ASD and ethnicity on ASD cases for all four ASD datasets.

Association between jaundice at birth and ASD cases

Figure 2b shows the distribution of ASD cases by jaundice at birth for all four datasets. Our association analysis results in Table 3 show that there is no significant association between patients’ jaundice at birth and ASD for child and adolescent in particular. Whereas the result for toddler and adult dataset have shown a significant association (as p<0.05) but the association strength found very weak/negligible (as V<0.1).

Table 3.

Measuring association using chi-square test

Categorical variables Test value p-value Cramer’s V Interpretation of association
Jaundice at birth Toddler 5.781* 0.016 0.074 Negligible (0<V0.1)
Child 0.001 0.981 0.001 No significant (pα)
Adolescent 0.574 0.449 0.048 No significant (pα)
Adult 7.561* 0.006 0.082 Negligible (0<V0.1)
Family ASD Toddler 0.192 0.661 0.014 No significant (pα)
Child 0.113 0.736 0.015 No significant (pα)
Adolescent 0.031 0.860 0.011 No significant (pα)
Adult 25.947* 0.000 0.152 Low association (0.1<V.3)
Ethnicity Toddler 43.571* 0.000 0.203 Low association (0.1<V0.3)
Child 17.841* 0.022 0.187 Low association (0.1<V0.3)
Adolescent 29.174* 0.000 0.343 Moderate association (0.3<V0.5)
Adult 121.831* 0.000 0.330 Moderate association (0.3<V0.5)

*Significant at 5% level, i.e., α=0.05

Association between family ASD and ASD cases

Figure 2c shows the distribution of ASD by Family ASD cases for all four datasets. This figure and the association analysis results in Table 3 show that there is no significant association between family ASD cases and their children ASD cases for toddler, child and adolescent. Whereas we got a significant p-value (p<0.05) for adult dataset but the resulted Cramer’s V value (V<0.3) makes the strength of the association low.

Association between ethnicity and ASD cases

Figure 3 depicts the distribution of ASD cases by various ethnicity for all four datasets. This figure and the association analysis results in Table 3 show that there is a significant association between ethnicity and ASD cases for all four datasets (p<0.05). The strength of the association found moderate for adolescent and adult whereas it was low for toddler and child.

Fig. 3.

Fig. 3

Ethnicity in ASD datasets

Applying classification techniques

We apply 27 benchmark machine learning classification techniques to determine the best technique(s) that results highest accuracy. No single technique is universally perfect for different datasets and all types of classification problems. After introducing the used evaluation matrix in “Evaluation matrix” section, we present our comparison results in “Comparison of classification techniques” section.

Evaluation matrix

For a given dataset and a predictive model, every data point will lie on one of the below four categories.

  • True Positive (TP): The individual having ASD and is correctly predicted as having ASD.

  • True Negative (TN): The individual not having ASD and was correctly predicted as not having ASD.

  • False Positive (FP): The individual not having ASD, is incorrectly predicted as having ASD.

  • False Negative (FN): The individual having ASD, is incorrectly predicted as not having ASD.

Those categories are used to compute the following evaluation matrix:

Accuracy: it is the measure of correct predictions made by the classifier. Accuracy is the number of correctly identified predictions by total number of predictions:

Accuracy=TP+TNTP+FP+FN+TN. 1

Precision: it measures the accuracy of positive predictions. It is the ratio of true positive out of the total observed positive.

Precision=TPTP+FP. 2

Recall/Sensitivity: this is also called true positive rate. It is the proportion of samples that are genuinely positive by all positive results obtained during the test.

Recall=TPTP+FN. 3

F-Measure: the F-score (or F-measure) considers both the precision and the recall of the test to compute the score. The traditional or balanced F-score (F1 score) is the harmonic mean of the precision and recall:

F-Measure(F1)=2×PrecisionPrecision+Recall. 4

Comparison of classification techniques

In classification, we consider all 15 attributes for toddler dataset and 16 attributes for child, adolescent and adult datasets. Table 4 presents classification results (F-Measure and accuracy) of 27 benchmark classification techniques for four ASD datasets. The results show that 8 out of 27 classifiers such as SMO, logistic regression, multi class classifier, simple logistic, logit boost, iterative classifier optimizer, real adaboost, LMT and multilayer perceptron (MLP) demonstrate competitive performance as they result 100% accuracy at least for one of the ASD datasets.

Table 4.

Comparison of classification techniques

Classifier Toddler Child Adolescent Adult
F1 Acc. (%) F1 Acc. (%) F1 Acc. (%) F1 Acc. (%)
OneR 0.81 80.551 0.79 78.98 0.76 76.63 0.83 83.18
PART 0.95 94.59 0.90 90.37 0.87 87.10 0.94 94.45
JRip (RIPPER) 0.94 93.45 0.87 87.43 0.85 84.68 0.94 94.28
Ridor 0.92 92.40 0.88 87.82 0.85 85.08 0.93 92.94
Nneg 0.93 92.60 0.82 82.32 0.81 81.05 0.88 88.55
Naive Bayes 0.96 95.54 0.93 92.93 0.91 91.13 0.94 94.10
LibSVM 0.97 96.96 0.49 49.31 0.93 92.74 0.96 96.42
Multilayer Perceptron (MLP) 1.0 100 1.0 100 0.98 98.79 1.0 100
Logistic regression (LR) 0.97 99.62 0.99 99.21 0.95 94.76 0.97 97.32
Simple logistic (SL) 1.0 100 0.99 99.80 0.96 95.97 0.99 99.82
SMO 1.0 100 1.0 100 0.97 97.18 1.0 100
IBK 0.93 92.70 0.88 88.016 0.89 88.71 0.92 92.13
KStar 0.95 94.50 0.84 84.09 0.88 88.31 0.93 92.58
LWL 0.85 84.54 0.79 78.98 0.79 79.03 0.79 78.35
Bagging 0.93 93.17 0.80 80.16 0.76 76.21 0.87 87.30
Iterative classifier optimizer (ICO) 1.0 100 0.99 99.41 0.97 97.18 0.99 99.91
LogitBoost (LB) 1.0 100 0.99 99.41 0.97 97.18 0.99 99.91
Multi class classifier 0.99 99.62 0.99 99.21 0.95 94.76 0.97 97.32
Real Adaboost (RAB) 1.0 100 0.99 99.80 0.97 97.18 0.99 99.82
Hoeffding Tree 0.95 95.25 0.91 91.16 0.90 90.73 0.93 92.84
J48 (C4.5) 0.90 90.32 0.89 88.99 0.81 81.85 0.93 93.29
LMT 1.0 100 0.99 99.80 0.96 95.96 0.99 99.82
NBTree 0.96 95.54 0.93 93.12 0.87 86.69 0.94 94.19
Random Forest 0.95 95.45 0.88 87.63 0.88 87.5 0.92 92.13
Random Tree 0.91 91.08 0.79 78.59 0.79 79.84 0.86 85.87
Simple CART 0.91 90.89 0.81 81.34 0.81 81.05 0.90 89.62
SysFor 0.93 92.79 0.87 86.84 0.86 85.89 0.93 93.38

Acc. accuracy

In the next section, we apply feature engineering techniques to further enhance the classification accuracy through selecting the most significant minimal set of attributes and identifying the classifier(s) that outperforms among the above eight classifiers for all four ASD datasets.

Feature engineering

We have applied and compared five prominent feature selections methods such as Information gain, Chi-square test, Pearson correlation, One-R and Relief F on four ASD datasets and listed their attributes’ ranking side by side (see Table 5).

Table 5.

Comparison of feature selection methods

Rank Information gain Chi squared Correlation One R Relief F
Toddler Child Adolescent Adult Toddler Child Adolescent Adult Toddler Child Adolescent Adult Toddler Child Adolescent Adult Toddler Child Adolescent Adult
1 A9 A4 A6 A6 A9 A4 A6 A6 A9 A4 A6 A6 A7 A4 A6 A6 A9 A4 A6 A5
2 A5 Res Res A5 A6 A6 A3 A9 A6 A6 A3 A9 A6 A9 A3 A9 A5 A8 A3 A6
3 A6 A6 A3 A9 A5 A9 Res A5 A5 A9 A4 A5 A5 A8 A4 A5 A2 A9 A5 A9
4 A7 A9 A4 Res A7 Res A4 A4 A7 Res A5 A4 A9 A6 A5 A4 A6 A10 A4 A3
5 A4 A8 A5 A4 A4 A8 A5 Res A4 A8 A9 A3 A1 A5 A9 A3 A7 A1 A7 A4
6 A1 A5 A9 A3 A1 A5 A9 A3 A1 A5 A10 A10 A4 A10 A10 A7 A4 A5 A9 A10
7 A2 A3 A10 A10 A2 A3 A10 A10 A2 A3 A7 A7 Age A3 A7 Res A1 A6 A10 A7
8 A8 A10 A7 A7 A8 A10 A7 A7 A8 A10 A2 A2 A3 A1 A2 A2 A8 A3 A2 A1
9 A3 A1 Ethn Ethn A3 A1 Ethn Ethn A3 A1 A1 A1 FASD A7 A1 A8 A3 A7 A1 A8
10 Ethn A7 A2 A1 Ethn A7 A2 A2 A10 A7 A8 A8 Jaun A2 A8 A1 A10 A2 A8 A2
11 A10 A2 A1 A2 A10 A2 A1 A1 Sex A2 Ethn Ethn A10 Res Res A10 Ethn Res Ethn Res
12 Age Ethn A8 A8 Age Ethn A8 A8 Jand Ethn Res FASD A8 Ethn Ethn Jand Jand Age Res Ethn
13 Sex Sex Jand FASD Sex Sex Jand FASD Ethn Sex Age Res Sex Age Age Sex Age Sex Age Jand
14 Jand FASD Sex Jand Jand FASD Sex Jand Age FASD Jand Jand A2 Sex Jand Ethn FASD Jand FASD Age
15 FASD Jand FASD Sex FASD Jand FASD Sex FASD Jand Sex Age Ethn Jand Sex FASD Sex FASD Sex Sex
16 Age Age Age Age Age Age Age FASD Sex FASD FASD Age Ethn Jand FASD

‘Res’ means Residence, ‘Ethn’ means Ethnicity, ‘FASD’ means Family ASD, ‘Jand’ means Jaundice

Analyzing feature selection techniques and feature ranking

In this section, we explore the effectiveness of feature selection techniques on ASD datasets. We know that the attributes which are the answers of the question A1 to A10, mainly the deciding factors of the ASD cases. Besides, answers of the demographic questions have little to no effect for identifying ASD cases. So, by comparing the five feature selection methods, we find that Relief F attribute selection method performs best among the five methods and it is capable of ranking A1 to A10 attributes before the demographic attributes for all four ASD datasets.

We count the total number of occurrences of the attributes in four ASD datasets respectively and compare the effects of the attributes in detecting ASD cases. In Fig. 4a–d, the first column in each group represents the score ‘1’ of the attribute when the ASD case is “yes” (lower portion) and “no” is the upper portion of the column, and the second column represents the score ‘0’ of the attribute when the ASD case is “yes” represents the lower portion and “no” represents the upper portion. We compare if the attribute has a better ratio in detecting ASD “yes” and “no cases in the first and second column respectively. Thus, if we rank these attributes, we find that it is consistent with Relief F feature ranking for all four datasets. Thus, we use Relief F feature selection method to identify the minimal set of most significant features that provides best accuracy.

Fig. 4.

Fig. 4

AQ-10 questions responses for ASD datasets

Identifying most significant features

We analyze the classifier performance with increasing number of attributes (as per their ranking) and compare the accuracy of eight selected classifiers. We observe that the accuracy increases with the increment of the number of attributes. It reaches to the maximum (for most of the classifiers) when the total number of attributes is 10, as shown in Tables 6, 7, 8 and 9. After that the accuracy remains mostly constant for toddler, child and adult but for adolescent it drops slightly with the increment of the number of attributes. The bolded results indicate the highest accuracy of that classifier. Thus, we can argue that A1 to A10 attributes are the most significant attributes/features to accurately detect the ASD case. Although we see that MLP and Logistic Regression (LR) classifiers exhibit 100% accuracy for top ten attributes (which is the minimal number of attributes), MLP’s accuracy remains same (i.e, 100%) for toddler, child and adult datasets with the increment of the attributes, whereas LR performance drops off after minimal attribute point. Moreover, for adolescent dataset LR performance drops is more than the MLP drops. Thus, we argue that MLP outperforms among all eight classifiers including LR.

Table 6.

Classifiers’ accuracy (%) with increasing number of attributes according to their ranks (from Relief F) for toddler ASD dataset

Attribute set Size MLP SMO LR SL LB ICO RAB LMT
{A9} 1 76.28 76.28 76.28 76.28 76.28 76.28 76.29 76.29
{A9, A5} 2 85.96 85.96 85.96 85.96 85.96 85.96 85.96 85.96
{A9, A5, A2} 3 88.99 88.99 88.99 88.99 88.99 88.99 89.0 89.0
{A9,...,A6} 4 88.43 88.43 89.18 89.18 88.99 89.37 89.38 89.19
{A9,...,A7} 5 91.94 92.32 92.32 92.32 92.32 92.32 92.32 92.32
{A9,...,A4} 6 91.08 91.56 91.08 91.46 91.56 91.65 91.75 91.18
{A9,...,A1} 7 93.17 94.50 94.50 94.50 94.31 94.31 94.03 94.5
{A9,...,A8} 8 94.78 96.39 95.83 96.02 96.11 96.39 96.12 96.02
{A9,...,A3} 9 96.02 96.02 95.64 96.02 96.02 96.02 95.93 96.02
{A9,...,A10} 10 100 100 100 100 100 100 100 100
{A9,...,Ethn} 11 100 100 99.72 100 100 100 100 100
{A9,...,Jand} 12 100 100 99.62 100 100 100 100 100
{A9,...,Age} 13 100 100 99.62 100 100 100 100 100
{A9,...,FASD} 14 100 100 99.53 100 100 100 100 100
{A9,...,Sex} 15 100 100 99.62 100 100 100 100 100

Table 7.

Classifiers’ accuracy (%) with increasing number of attributes according to their ranks (from Relief F) for child ASD dataset

Attribute set Size MLP SMO LR SL LB ICO RAB LMT
{A4} 1 78.98 78.98 78.98 78.98 78.98 78.98 78.98 78.98
{A4, A8} 2 78.98 78.98 78.98 78.98 78.98 78.98 78.98 78.98
{A4, A8, A9} 3 80.55 78.78 81.14 81.14 81.14 81.14 80.56 81.14
{A4,...,A10} 4 81.73 82.51 83.50 83.50 83.50 83.50 83.5 82.72
{A4,...,A1} 5 83.69 87.43 85.66 86.44 86.05 85.46 85.66 85.47
{A4,...,A5} 6 91.75 90.37 88.80 89.78 89.19 89.00 89.79 92.15
{A4,...,A6} 7 92.73 90.96 91.16 91.16 90.77 90.57 91.31 91.56
{A4,...,A3} 8 93.52 91.75 93.52 93.52 93.52 92.93 93.13 93.52
{A4,...,A7} 9 95.68 93.71 94.50 93.91 94.50 94.30 94.9 93.91
{A4,...,A2} 10 100 100 100 99.80 99.41 99.41 99.81 99.81
{A4,...,Res} 11 100 100 97.25 99.80 99.41 99.41 99.81 99.81
{A4,...,Age} 12 100 100 97.25 99.80 99.41 99.41 99.81 99.81
{A4,...,Sex} 13 100 100 97.25 99.80 99.41 99.41 99.81 99.81
{A4,...,Jand} 14 100 100 97.45 99.80 99.41 99.41 99.81 99.81
{A4,...,FASD} 15 100 100 97.25 99.80 99.41 99.41 99.81 99.81
{A4,...,Ethn} 16 100 100 98.82 99.80 99.41 99.41 99.81 99.81

Table 9.

Classifiers’ accuracy (%) with increasing number of attributes according to their ranks (from Relief F) for adult ASD dataset

Attribute set Size MLP SMO LR SL LB ICO RAB LMT
{A5} 1 74.87 75.94 75.94 75.94 75.94 75.94 75.94 75.94
{A5, A6} 2 85.24 85.24 85.24 85.24 85.24 85.24 85.25 85.25
{A5, A6, A9} 3 87.30 86.49 87.75 87.66 87.75 87.66 87.39 87.66
{A5,...,A3} 4 90.70 90.70 90.70 90.70 90.70 90.70 90.7 90.7
{A5,...,A4} 5 89.98 90.34 90.16 89.62 90.16 90.16 90.26 89.54
{A5,...,A10} 6 91.14 91.68 91.32 90.61 90.25 90.43 91.42 90.61
{A5,...,A7} 7 92.84 94.01 94.01 94.01 94.10 94.01 94.01 94.01
{A5,...,A1} 8 94.36 95.80 95.44 95.62 94.72 94.81 95.62 95.62
{A5,...,A8} 9 95.35 95.71 96.15 96.06 95.89 95.71 95.89 96.07
{A5,...,A2} 10 100 100 100 99.82 99.91 99.91 99.83 99.83
{A5,...,Res} 11 100 100 98.39 99.82 99.91 99.91 99.83 99.83
{A5,...,Ethn} 12 100 100 97.85 99.82 99.91 99.91 99.83 99.83
{A5,...,Jand} 13 100 100 98.30 99.82 99.91 99.91 99.83 99.83
{A5,...,Age} 14 100 100 97.76 99.82 99.91 99.91 99.83 99.83
{A5,...,Sex} 15 100 100 97.41 99.82 99.91 99.91 99.83 99.83
{A5,...,FASD} 16 100 100 97.32 99.82 99.91 99.91 99.83 99.83

Classification results comparison

In this section, we compare the prediction accuracy of this paper with state-of-the-art research. Most of the previous research works are based on version-1 ASD dataset [19, 20, 2225]. Among them only Baranwal and Vanitha [23] have considered feature reduction while keeping the accuracy maximum.

To the best of our knowledge, Thabtah et al. [1] worked on version 2 adolescent and adult datasets only, and applied logistic regression classifier. The authors applied chi-squared and information gain feature ranking techniques to identify the most significant attributes and achieved 99% accuracy for adolescent dataset and 97.58% accuracy for adult dataset. Whereas, we have applied five feature ranking methods including chi-squared and information gain, and have found that Relief F feature ranking outperforms chi-squared and information gain by contributing to achieve 100% classification accuracy. Moreover, we have systematically select best classifier (from benchmark 27 classifiers), identify best performing feature ranking method (from five prominent methods), and determine minimal set of attributes to achieve best accuracy. Furthermore, we have used all four version-2 ASD datasets.

Since our research is on version-2 dataset, we can only compare the work that used the same dataset such as Thabtah et al. [1]. In spite of this, we have added the version-1 related works in our comparison Table 10 to provide an overview of performance achieved in detecting ASD. The results presented in Table 10 show that this paper outperforms state-of-the-art research on ASD detection (irrespective of the dataset version), and the first work that used toddler ASD dataset.

Table 10.

Performance comparison between different ASD detection works

Paper Dataset Feature reduction Best classifier Toddler Child Adolescent Adult
Recall Accuracy Recall Accuracy Recall Accuracy Recall Accuracy
[19] v1 SVM 1 100
[20] v1 RF 0.93 91.74
[25] v1 RML 0.91 91 0.87 87.50 0.94 94.50
[22] v1 CNN 0.967 98.30 0.93 96.85 0.99 99.53
[23] v1 ANN, LR, SVM 1 96.77 1 80.0 0.98 98.90
[24] v1 RF 1 100 1 100 1 100
[1] v2 LR 0.99 99.91 0.97 97.58
This paper v2 MLP 1* 100* 1* 100* 1* 100* 1* 100*

*Result from top ten attributes

Conclusion

In this study, we have analyzed the ASD datasets of toddler, child, adolescent and adult. We apply most popular five feature selection methods to derive fewer features from ASD datasets yet maintaining competitive performance. We find that Relief F feature selection method outperforms amongst others. In our experimental setup, we increase the attribute numbers gradually and then apply different classification techniques. We find that MLP outperforms amongst all other classifiers using our methodology and approach.

The main limitation of this research is the small datasets. In future work, we aim to collect large datasets and work with deep learning methods that integrate feature assessment and classification together for improved performance. Also, we would like to analyse brain signals (e.g., EEG) to relate this with AQ based study in order to develop a more robust ASD detection algorithm.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Md Delowar Hossain, Email: mailtodelowar@gmail.com.

Muhammad Ashad Kabir, Email: akabir@csu.edu.au.

Adnan Anwar, Email: adnan.anwar@deakin.edu.au.

Md Zahidul Islam, Email: zislam@csu.edu.au.

References

  • 1.Thabtah F, Abdelhamid N, Peebles D. A machine learning autism classification based on logistic regression analysis. Health Inf Sci Syst. 2019;7(1):1. doi: 10.1007/s13755-019-0073-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Wiggins LD, Reynolds A, Rice CE, Moody EJ, Bernal P, Blaskey L, Rosenberg SA, Lee LC, Levy SE. Using standardized diagnostic instruments to classify children with autism in the study to explore early development. J Autism Dev Disord. 2015;45(5):1271. doi: 10.1007/s10803-014-2287-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lord C, Rutter M, Le Couteur A. Autism Diagnostic Interview-Revised: a revised version of a diagnostic interview for caregivers of individuals with possible pervasive developmental disorders. J Autism Dev Disord. 1994;24(5):659. doi: 10.1007/BF02172145. [DOI] [PubMed] [Google Scholar]
  • 4.Le Couteur AL, Gottesman I, Bolton P, Simonoff E, Yuzda E, Rutter M, Bailey A. Autism as a strongly genetic disorder evidence from a British twin Study. Psychol Med. 1995;25(1):63. doi: 10.1017/S0033291700028099. [DOI] [PubMed] [Google Scholar]
  • 5.Chawla NV. Data mining and knowledge discovery handbook. Boston: Springer; 2009. pp. 875–86. [Google Scholar]
  • 6.Geschwind D, Sowinski J, Lord C. Letters to the Editor 463. Am J. 2001.
  • 7.Pratap A, Kanimozhiselvi CS, Vijayakumar R, Pramod KV. Soft computing models for the predictive grading of childhood autism—a comparative study. Tech Report 2014; p. 4.
  • 8.Fischbach G, Lord C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron. 2010;68(2):192–5. doi: 10.1016/j.neuron.2010.10.006. [DOI] [PubMed] [Google Scholar]
  • 9.Islam MR, Kabir MA, Ahmed A, Kamal ARM, Wang H, Ulhaq A. Depression detection from social network data using machine learning techniques. Health Inf Sci Syst. 2018;6(1):8. doi: 10.1007/s13755-018-0046-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Allison C, Auyeung B, Baron-Cohen S. Toward brief “Red Flags” for autism screening: the short autism spectrum quotient and the short quantitative checklist in 1,000 cases and 3,000 controls. Tech. rep. 2012. [DOI] [PubMed]
  • 11.Duda M, Ma R, Haber N, Wall DP. Use of machine learning for behavioral distinction of autism and ADHD. Transl Psychiatry. 2016;6(2):e732. doi: 10.1038/tp.2015.221. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Thabtah F. Autism spectrum disorder screening: machine learning adaptation and DSM-5 fulfillment, dl.acm.org Part F129311, 1 2017.
  • 13.Parikh MN, Li H, He L. Enhancing diagnosis of autism with optimized machine learning models and personal characteristic data. Front Comput Neurosci. 2019;13:9. doi: 10.3389/fncom.2019.00009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Thabtah F. ASD Tests. A mobile app for ASD screening. https://www.asdtests.com/. Accessed 10 Jan 2020.
  • 15.Zunino A, Morerio P, Cavallo A, Ansuini C, Podda J, Battaglia F, Veneselli E, Becchio C, Murino V. In: 24th International conference on pattern recognition (ICPR), August, vol. 2018. IEEE; 2018. pp. 3421–6.
  • 16.Baron-Cohen S, Wheelwright S, Skinner R, Martin J, Clubley E. The autism-spectrum quotient (AQ): evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. J Autism Dev Disord. 2001;31(1):5. doi: 10.1023/A:1005653411471. [DOI] [PubMed] [Google Scholar]
  • 17.Thabtah F. ASD Dataset—UCI machine learning repository, 2017. https://archive.ics.uci.edu/ml. Accessed 13 Jan 2020.
  • 18.Thabtah F. ASD Dataset. https://fadifayez.com/autism-datasets/. Accessed 13 Jan 2020.
  • 19.Basu K. Autism detection in adults dataset. http://shorturl.at/ekyTW.
  • 20.McNamara B, Lora C, Yang D, Flores F, Daly P. Machine learning classification of adults with autism spectrum disorder. http://shorturl.at/hEFJY. Accessed 25 June 2020.
  • 21.Hossain MD, Kabir MA. In: Proceedings of the 17th world congress on medical and health informatics (MedInfo 2019), vol. 264. IOS; 2019. p. 1447–8.
  • 22.Raj S, Masood S. Analysis and detection of autism spectrum disorder using machine learning techniques. Procedia Comput Sci. 2020;167:994. doi: 10.1016/j.procs.2020.03.399. [DOI] [Google Scholar]
  • 23.Baranwal A, Vanitha M. Autistic spectrum disorder screening: prediction with machine learning models. In: 2020 International conference on emerging trends in information technology and engineering (ic-ETITE), pp. 1–7 (2020).
  • 24.Erkan U, Thanh D. Autism spectrum disorder detection with machine learning methods. Curr Psychiatry Rev. 2019;15:297. doi: 10.2174/2666082215666191111121115. [DOI] [Google Scholar]
  • 25.Thabtah F, Peebles D. A new machine learning model based on induction of rules for autism detection. Health Inform J. 2020;26(1):264. doi: 10.1177/1460458218824711. [DOI] [PubMed] [Google Scholar]

Articles from Health Information Science and Systems are provided here courtesy of Springer

RESOURCES