Abstract
The early identification of Autism Spectrum Disorder (ASD) remains a critical challenge in neurodevelopmental research, with current diagnostic processes often delayed by subjective assessments and limited clinical resources. This paper presents a memory-efficient Neural Architecture Search framework that autonomously identifies optimum neural network structures for ASD classification. Unlike existing genetic algorithm-based NAS approaches requiring over 16GB GPU memory, our framework achieves 76% memory reduction while maintaining superior performance. Our approach presents three key innovations: (1) a novel search space integrating simple, residual, and bottleneck operations with
complexity for L layers; (2) a memory-efficient genetic algorithm that decreases GPU memory consumption by
relative to current methodologies while preserving search efficacy; and (3) an adaptive fitness function that equilibrates model performance with computational complexity. Through comprehensive experiments utilizing a substantial dataset (
;
,
), our methodology attained a classification accuracy of
(
CI: 94.89-
) and area under the Receiver Operating Characteristic (ROC) curve of 0.986, which markedly surpassed existing state-of-the-art techniques (traditional CNN: 92.3%, ResNet-based: 94.1%, LSTM: 93.7%). The framework achieves this performance with 2.8M parameters and 15ms processing time per sample, demonstrating practical viability for clinical deployment in resource-constrained settings where current diagnostic procedures extend 4-5 years after symptom onset.
Keywords: Autism spectrum disorder, Neural architecture search, Genetic algorithm, Mixed operations, ADOS examination
Subject terms: Computational biology and bioinformatics, Engineering, Mathematics and computing, Neuroscience
Introduction
Autism Spectrum Disorder (ASD) is a complex neurodevelopmental condition characterized by persistent challenges in social communication and interaction, accompanied by restricted or repetitive behavioral patterns1. With ASD now affecting approximately one in 36 children globally, the need for accurate and timely diagnostic tools has become increasingly critical2,3.
Traditional diagnostic approaches rely heavily on behavioral observations and clinical assessments, which present significant limitations in terms of both accuracy and accessibility4. The reliance on subjective behavioral assessments introduces substantial inter-rater variability, while the requirement for specialized training creates accessibility barriers, particularly in resource-limited settings5. Moreover, current diagnostic procedures often extend to 4-5 years after symptom onset, resulting in delayed interventions that may impact developmental outcomes6.
Recent advances in artificial intelligence, particularly deep learning architectures, have shown promise in medical diagnosis, achieving classification accuracies between 70% and 90%7,8. However, current approaches face significant limitations in their application to ASD diagnosis. Manual architecture design often fails to capture ASD-specific patterns optimally, and substantial computational requirements hinder its clinical implementation9. Furthermore, these approaches demonstrate limited generalizability across diverse patient populations, compromising their practical utility10.
Novel contributions compared to existing GA-based NAS
While GA-based NAS has shown promise in medical applications11–13, existing approaches face three critical limitations that our framework addresses:
Memory Efficiency: Park et al.11 and Klos et al.12 require
GB and 12-18GB GPU memory respectively. Our framework achieves 2.1GB (76% reduction) through gradient accumulation and memory-aware fitness evaluation.
Search Space Diversity: Existing GA-NAS uses homogeneous operations (single type per architecture). Our mixed-operations space (
) combines simple, residual, and bottleneck operations within the same architecture.
Clinical Deployment: Prior work11–13 optimizes for accuracy only. Our fitness function balances accuracy, memory footprint, and inference time—critical for clinical settings.
Table 1 quantifies these improvements.
Table 1.
Quantitative comparison with GA-based NAS approaches.
| Method | Memory (GB) | Operations | Search time | Clinical focus |
|---|---|---|---|---|
| Park11 | ![]() |
Single type | 3–5 days | No |
| Klos12 | 12-18 | Single type | 2–4 days | No |
| Baldeon13 | 8-12 | Dual type | 1–2 days | No |
| Ours | 2.1 (76% )
|
Mixed (3 types) | 8–12 hours | Yes |
Memory reduction calculated relative to Park et al.11.
Recent technological advances have significantly enhanced ASD screening capabilities through artificial intelligence and Internet of Things applications14. Computer vision-based assessments have shown particular promise in analyzing interactions, emotions, and human poses in children with autism15,16. Neural Architecture Search (NAS) has emerged as a promising solution to these limitations, demonstrating remarkable success in computer vision and medical imaging applications13. However, its application to neurodevelopmental disorders remains significantly underexplored, primarily due to two critical challenges: excessive computational requirements, with standard implementations demanding more than 16GB of GPU memory12, and limited biological interpretability of the resulting architectures11.
This study introduces a novel approach to address these limitations through several key innovations. We present a memory-efficient genetic algorithm(GA)-driven NAS framework specifically optimized for ASD classification. Our approach incorporates a mixed-operations search space that integrates simple convolutions, residual blocks, and bottleneck transformations17. Through advanced memory optimization strategies, we achieved a 76% reduction in GPU memory consumption, enabling large-scale analysis of neuroimaging data (N = 2,587,704 samples).
Our research objectives encompass the development and validation of this GA-driven framework, with an emphasis on evaluating the effectiveness of mixed-operations in capturing ASD-specific patterns. We established comprehensive performance benchmarks against existing approaches and validated clinical viability through rigorous efficiency metrics and scalability assessments. The framework’s performance is evaluated through extensive experiments on a large-scale dataset comprising pose estimation features from standardized (the Autism Diagnostic Observation Schedule—ADOS) examinations.
The remainder of this paper is organized as follows. Section 2 presents a comprehensive review of current ASD classification methodologies, with an emphasis on deep learning approaches. Section 3 provides a detailed description of our genetic NAS framework, including the search space design and optimization strategies. Section 4 outlines the experimental methodology and evaluation metrics. Section 5 presents the results and performance analysis. Section 6 discusses the implications of our findings for clinical applications and automated diagnosis. Finally, Section 7 concludes the paper with a summary of the contributions and directions for future research.
Literature review
The early detection and diagnosis of Autism Spectrum Disorder (ASD) remain significant challenges in clinical practice1,18. Recent years have witnessed substantial advancements in computational approaches for ASD diagnosis, particularly in the application of machine learning and artificial intelligence techniques.
Deep learning applications in ASD detection
Facial image analysis has emerged as a promising approach for ASD screening. Lu and Perkowski5 demonstrated the effectiveness of deep learning in this domain, achieving 95% classification accuracy using VGG16 transfer learning while emphasizing the critical role of ethnoracial factors. Building on this foundation, Alkahtani et al.7 implemented MobileNetV2 for facial landmark analysis, achieving 92% accuracy and improved computational efficiency. Significant progress has been made in multi-modal approaches. Natraj et al.9 developed a comprehensive video-audio neural network ensemble that achieved 82.5% accuracy in screening young children, with a particularly strong performance in capturing diverse ASD manifestations. This study was complemented by Feng and Xu8, who achieved remarkable results using deep learning on fMRI data, reporting 99.39% accuracy, 98.80% recall, and 99.85% precision. Recent research has demonstrated diverse computational approaches for ASD detection. While some studies have focused on behavioral pattern analysis through video processing16,19, others have explored multi-modal approaches incorporating various data sources20. Data-centric methodologies have shown particular promise, with selective feature analysis improving prediction accuracy21 and grid-tuned ensemble models demonstrating robust classification performance22. Additionally, optimization-based approaches have enhanced the reliability of ASD detection systems23,24. Recent multi-modal frameworks have demonstrated enhanced diagnostic capabilities. Khan and Katarya25 proposed MCBERT, a multi-modal framework integrating multiple data sources for ASD diagnosis. Their work on white shark optimization with Bi-LSTM26 showed improved classification through hybrid optimization strategies. Adaptive feature fusion techniques27 using Bat-PSO-LSTM frameworks have demonstrated the effectiveness of nature-inspired optimization in ASD detection. Machine learning surveys28 highlight current trends emphasizing multi-modal data integration. Neuroimaging-based approaches using Xception architectures29 and empirical evaluations of various techniques30 have established benchmarks for neuroimaging-based diagnosis. Self-supervised learning approaches using facial images31 represent emerging directions in automated ASD classification, achieving competitive results with reduced labeled data requirements.
Neural architecture search paradigms
Neural Architecture Search has emerged as a transformative approach with three primary paradigms: reinforcement learning-based methods, differentiable approaches (e.g., DARTS), and evolutionary strategies. Reinforcement learning methods formulate architecture search as sequential decision-making but require substantial computational resources. Differentiable NAS (DARTS) enables gradient-based optimization through continuous relaxation but faces memory consumption growing linearly with candidate operations. Evolutionary approaches offer natural parallelization and better exploration of discrete spaces but typically require large populations for stable convergence. Our approach builds upon evolutionary NAS while addressing memory limitations through population-based memory management and domain-aware fitness evaluation for medical applications.
Neural architecture optimization
The optimization of neural architectures has become increasingly crucial for improving ASD classification performance. Park et al.11 introduced an innovative approach using chromosome non-disjunction in genetic algorithms, demonstrating improved accuracy without expert intervention but requiring
GB GPU memory. Klos et al.12 developed a scalable GA-based NAS achieving 99.75% accuracy yet computational requirements remained prohibitive for resource-constrained environments. This work was extended by Klos et al.12, who developed a scalable genetic algorithm-based neural architecture search deployed in a Kubernetes cluster, achieving 99.75% accuracy on benchmark datasets. Baldeon Calisto and Lai-Yuen13 proposed EMONAS, an efficient multi-objective neural architecture search framework that significantly reduced computational requirements while maintaining high performance. Their work particularly emphasized the importance of balancing model accuracy with architectural efficiency.
Feature engineering and selection
Advanced feature selection methods have proven crucial for improving diagnostic accuracy. Yousef et al.18 developed a hybrid GA-KNN approach that effectively reduced the feature set from 21 to 9 dimensions while improving classification accuracy to 97.5%. This work was complemented by Narayanan et al.6, who achieved exceptional results through sophisticated fMRI feature extraction, reporting 99.39 Guruvammal32 introduced an optimized Long Short-Term Memory approach incorporating modified data normalization and statistical feature extraction, demonstrating improved performance in young children. Similarly, Uddin et al.2 developed an integrated framework combining statistical analysis with machine learning, emphasizing the importance of feature selection in clinical applications.
Clinical implementation
The translation of computational methods to clinical practice represents a critical research direction. He and Liu33 developed a hybrid deep learning architecture for analyzing home videos, achieving 84% accuracy in real-world settings. This practical approach was further advanced by Shahamiri et al.4, who demonstrated the potential for integrating deep neural networks into existing screening protocols. Recent work by Chern and Al-Hababi10 introduced a novel hybrid Unet-RBF and CNN-RBF algorithm, achieving 94.79% accuracy and establishing new benchmarks for clinical implementation. Their work particularly emphasized the importance of architectural innovations in improving diagnostic reliability.
Research challenges and future directions
Despite significant advances, several challenges remain. Singh et al.3 highlighted the need for age-specific approaches, demonstrating varying accuracy rates across different age groups. Luo et al.17 emphasized the importance of addressing data heterogeneity, achieving a 4.56 The integration of multiple diagnostic modalities remains an active area of research. Gautam et al.34 demonstrated the potential of YOLOv8 in ASD screening, achieving 89.64% accuracy, while Tan35 reported 99.55% accuracy using a DNN-based approach for children aged 0-10 years. These advances and challenges inform our current research, which addresses the need for computationally efficient and clinically reliable ASD diagnosis through innovative neural architecture search (NAS) techniques. Our work particularly focuses on addressing the limitations identified in existing approaches while maintaining high diagnostic accuracy.
Limitations of existing approaches
Despite substantial progress, several critical limitations prevent clinical translation:
Computational Requirements: Existing deep learning approaches face prohibitive computational demands. ResNet-based methods8 require 23.5M parameters and
GB GPU memory. GA-based NAS methods11,12 demand 12-18GB memory, limiting deployment on standard clinical workstations (typically 4-8GB). Our approach addresses this through memory-aware architecture search, reducing requirements to 2.1GB (76% reduction).
Operation Diversity: Current NAS implementations explore homogeneous operation spaces. Park et al.11 restricted search to standard convolutions, limiting architectural expressiveness. Our mixed-operations search space balances diversity with searchability.
Domain-Specific Constraints: Existing fitness functions optimize solely for accuracy11,12, neglecting clinical deployment constraints. Memory footprint, inference latency, and interpretability remain unaddressed. Our multi-objective fitness explicitly balances accuracy (95.23%), efficiency (15ms inference), and memory (2.1GB).
Validation Scale: Existing studies evaluate on datasets with hundreds to low thousands of samples5,7,10, limiting generalizability assessment. Our evaluation uses 2,587,704 samples—approximately two orders of magnitude larger—providing robust evidence of generalization capability.
Materials and methods
System architecture
This study presents a comprehensive framework for automated Autism Spectrum Disorder (ASD) classification through Neural Architecture Search (NAS). The proposed system integrates three principal components: a data processing pipeline, a hierarchical search space, and an evolutionary optimization strategy (Fig. 1). Our framework systematically discovers optimal neural architectures while maintaining computational efficiency and classification performance through an iterative refinement process.
Fig. 1.
Neural architecture search framework overview. The system comprises three main components: (A) Data Pipeline for processing raw ASD/TD (Typically Developing) data, (B) Search Space Visualization defining the architectural domain with mixed operations (simple, residual, bottleneck) across four layers, and (C) Genetic Algorithm Optimization managing the evolutionary search process (20 architectures, 10 generations). The framework concludes with Training and Optimization phase evaluating accuracy, precision/recall, F1-score, and ROC-AUC.
The data pipeline implements a systematic preprocessing workflow that transforms raw behavioral data into meaningful feature representations. This component ensures consistent data preparation through three sequential stages: feature extraction, normalization, and strategic data partitioning, maintaining the integrity of essential behavioral markers while standardizing the input format for subsequent architectural exploration.
Within the search space visualization component, we define a hierarchical architecture domain comprising three fundamental layers. The first layer incorporates operation modules, including simple transformations, residual connections, and bottleneck operations, each designed to capture distinct levels of behavioral feature complexity. The second layer implements a progressive dimensionality reduction strategy, with layer sizes decreasing from 512 to 64 units through intermediate stages of 256 and 128 units. This configuration allows flexible operation selection while maintaining controlled dropout rates ranging from 0.1 to 0.5. The third layer integrates multiple activation functions, including ReLU, LeakyReLU, ELU, and GELU, with learnable operation weights to optimize feature transformation.
For architectural optimization, we employ a genetic algorithm that orchestrates the evolutionary search process. The algorithm maintains a population of 20 candidate architectures across 10 generations, with each architecture encoded as a comprehensive chromosome representing operations, dropout rates, and activation functions. The selection process utilizes tournament selection with a size of 3 and implements controlled mutation rates of 0.1 to balance exploration and exploitation. The fitness evaluation incorporates multiple performance metrics, including classification accuracy, computational efficiency, and memory utilization.
The framework operates through continuous interaction between these components, culminating in a rigorous Training and Optimization phase. This final phase evaluates candidate architectures using comprehensive performance metrics, including accuracy, precision-recall characteristics, F1 scores, and ROC-AUC measurements. The integration of these components enables efficient exploration of the architectural search space while maintaining strict performance standards for clinical applicability.
Dataset and preprocessing
Ethical considerations
This study utilized data collected under the approval of the Ethics Committee of the Faculty of Medicine at the University of Geneva, Switzerland. The original data collection was conducted between September 25th, 2012, and September 10th, 2023, adhering to the guidelines and regulations established by the University of Geneva. Written informed consent was obtained from parents and/or legal guardians for all participating children. Participants diagnosed with Autism Spectrum Disorder (ASD) met the clinical criteria outlined in DSM-536, with diagnoses supported by gold standard diagnostic assessments. Typically developing (TD) children were screened to ensure the absence of neurological or psychiatric conditions and no history of ASD in first-degree relatives.
Our research protocol for the analysis of this dataset was designed to maintain participant privacy and data security. Data privacy considerations are paramount in ASD research, with recent work focusing on secure data handling and sanitization techniques37. Our approach implements similar privacy-preserving measures while maintaining diagnostic accuracy. All video and audio recordings were de-identified before processing, and pose estimation features were extracted using automated processes that preserve participant anonymity. Data handling and storage procedures complied with relevant data protection regulations, with access restricted to authorized research personnel. Our approach builds upon established video-based pose estimation techniques for ASD prediction19, incorporating advanced action pattern mining and multi-channel fusion methods20 for enhanced feature extraction.
Data acquisition and cohort characteristics
Dataset Selection Rationale. This study employs pose estimation features from standardized ADOS examinations for three methodological reasons. First, ADOS represents the clinical gold standard for ASD diagnosis36, providing validated ground truth labels critical for supervised learning. Second, pose estimation captures motor and behavioral patterns (joint angles, movement dynamics, gaze coordination) that constitute objective, quantifiable markers of ASD symptomatology, unlike subjective clinical observations. Third, video-based assessment enables naturalistic behavior capture during structured social interactions, addressing ecological validity concerns in ASD screening research19.
The dataset scale (N=1,262,856 samples from 160 participants) exceeds typical ASD neuroimaging studies by approximately two orders of magnitude, enabling robust deep learning model training while maintaining clinical validity through expert-certified diagnoses. The age range (3–11 years, mean 6.7±2.1) targets the critical early intervention window identified in developmental research.
The study utilized a comprehensive multi-modal dataset collected from 160 age-matched participants undergoing standardized Autism Diagnostic Observation Schedule (ADOS) assessments. The cohort comprised 80 individuals diagnosed with ASD and 80 typically developing (TD) controls, with balanced demographic distribution(76 male, 84 female) and age range of 3-11 years (mean 6.7±2.1 years), The detailed participant distribution across training and testing splits is presented in Table 2. Clinical diagnoses were established by ADOS-2 certified experts following standardized protocols. Data collection encompassed 10-minute video recordings of structured social interactions, synchronized audio recordings of verbal exchanges, and standardized ADOS examination protocols.
Table 2.
Participant distribution across dataset splits.
| Split | ASD | TD | Total |
|---|---|---|---|
| Training | 40 | 40 | 80 |
| Testing | 40 | 40 | 80 |
| Total | 80 | 80 | 160 |
The dataset demonstrates exceptional balance across diagnostic categories, as illustrated in Fig. 2. The distribution comprises 627,672 samples (49.7%) from individuals with ASD and 635,184 samples (50.3%) from typically developing (TD) controls, ensuring unbiased model training and evaluation.
Fig. 2.
Class distribution of the dataset showing balanced representation between ASD (627,672 samples) and TD (635,184 samples) categories. The near-equal distribution (49.7% ASD, 50.3% TD) ensures unbiased model training and evaluation.
Preprocessing pipeline
The preprocessing pipeline consists of four sequential stages designed to transform raw behavioral data into structured feature representations. The first stage implements temporal segmentation, where 10-minute ADOS examination clips undergo standardization with consistent frame rate processing and temporal alignment of multi-modal data streams. This ensures uniformity in temporal resolution across all samples while preserving behavioral dynamics.
The second stage employs MediaPipe-based pose estimation to extract 33 anatomically significant bodypoint coordinates. This process generates real-time tracking of kinematic parameters and computes spatial-temporal features representing movement patterns. The extraction process maintains temporal coherence while capturing fine-grained motor behaviors characteristic of ASD manifestations.
Feature engineering constitutes the third stage, where we generate 2048-dimensional feature vectors incorporating multiple behavioral markers. These include kinematic parameters such as joint angles and movement smoothness metrics, gaze coordination measurements, and cross-modality temporal correlations. The high dimensionality of the feature space ensures comprehensive capture of behavioral patterns while maintaining computational tractability.
The final stage implements data standardization through z-score normalization of numerical features, stratified shuffling for training-test independence, and batch normalization preparation. This process ensures statistical consistency across the dataset while preserving the relative relationships between behavioral features.
![]() |
1 |
where x represents the raw feature value,
denotes the mean, and
represents the standard deviation of the feature distribution.
Quality control protocol
Our quality control framework implements comprehensive validation measures across multiple dimensions. Processing speed monitoring reveals consistent throughput rates ranging from 79 to 308 iterations per second for the training set, as detailed in Table 3, with corresponding resource utilization optimization. Feature quality metrics maintain strict threshold criteria, including intra-class correlation coefficients exceeding 0.85 for kinematic features and temporal segmentation variance below 5%.
Table 3.
Dataset quality metrics and processing performance.
| Metric | Training set | Testing set |
|---|---|---|
| Processing rate (ASD) | 79.12 it/s | 32.67 it/s |
| Processing rate (TD) | 308.92 it/s | 33.00 it/s |
| Feature quality (ICC) | ![]() |
![]() |
| Temporal variance |
% |
% |
Sample distribution analysis ensures balanced allocation across files, maintaining 130–131 samples per file with consistent batch composition. The framework implements strategic shuffling procedures to guarantee statistical independence while preserving demographic balance across splits.
Data availability and reproducibility
To facilitate reproducibility while maintaining participant privacy, we provide comprehensive access to de-identified pose estimation features through Zenodo((DOI: 10.5281/zenodo.12652659) and implementation code at https://mega.nz/file/CRUD0YiS#fl62EyWul5TlfB9dj8a7jBL7ON12Qji0eizpjOwdWGI. The final preprocessed dataset represents a significant advancement in ASD screening research, encompassing 2,587,704 total samples (1,262,856 training, 1,324,848 testing), approximately two orders of magnitude larger than existing ASD screening datasets.
The implementation code includes neural architecture specifications, preprocessing pipelines, and complete documentation of processing protocols and quality control metrics. This comprehensive data sharing approach ensures reproducibility while maintaining strict privacy standards for clinical data.
Search space design
The search space for our neural architecture encompasses multiple operational layers with distinct configurations optimized for ASD feature processing. Each layer implements a specific set of transformations designed to capture hierarchical representations of behavioral patterns while maintaining computational efficiency.
Layer configuration
The architecture implements a progressive dimensionality reduction strategy through four primary layers, as detailed in Table 4. Each layer maintains flexibility in operation selection while adhering to strict dimensional constraints for efficient feature processing.
Table 4.
Neural architecture layer specifications.
| Layer | Input dimension | Output dimension | Operations |
|---|---|---|---|
| 1 | ![]() |
512 | ![]() |
| 2 | 512 | 256 | ![]() |
| 3 | 256 | 128 | ![]() |
| 4 | 128 | 64 | ![]() |
: simple operation,
: residual operation,
: bottleneck operation. Dropout rates range [0.1, 0.5] with 0.1 step size.
Operation definitions
The framework implements three fundamental operations designed to capture different levels of feature abstraction, as illustrated in Fig. 3. The simple operation applies a basic transformation defined by:
![]() |
2 |
where Linear
, with W representing the weight matrix and b the bias vector.
Fig. 3.
Core operation architectures illustrating the structural components of each transformation type. (A) Simple operation implementing direct feature transformation. (B) Residual operation incorporating skip connections. (C) Bottleneck operation employing dimensionality reduction and expansion.
The residual operation incorporates skip connections to facilitate gradient flow:
![]() |
3 |
The bottleneck operation implements dimensionality reduction through:
![]() |
4 |
Activation functions
The framework incorporates four distinct activation functions, each providing specific non-linear transformation characteristics:
![]() |
5 |
where
represents the cumulative distribution function of the standard normal distribution.
Weight configuration
The final output computation for each layer incorporates weighted combinations of operations:
![]() |
6 |
where weights are normalized through softmax:
![]() |
7 |
Operation weights are initialized uniformly in [0,1] and normalized to ensure
for all operations within each layer, as detailed in Table 5.
Table 5.
Operation weight distribution parameters.
| Operation type | Weight range | Initialization |
|---|---|---|
| Simple | [0, 1] | Uniform(0, 1) |
| Residual | [0, 1] | Uniform(0, 1) |
| Bottleneck | [0, 1] | Uniform(0, 1) |
Neural architecture search implementation
Our Neural Architecture Search framework employs an evolutionary approach to discover optimal architectures for ASD classification. The overall search process, outlined in Algorithm 1, integrates multiple components designed to balance exploration of the architectural space with computational efficiency. The proposed NAS framework employs a genetic algorithm for efficient architecture search. Algorithm 1 outlines the main components of our approach.
Algorithm 1.
Neural architecture search with genetic algorithm for ASD classification.
Computational Complexity: Algorithm 1 has time complexity
where E is the evaluation cost per architecture (dominant term). Space complexity is
where M is model size. Search space size:
for
layers, yielding
possible architectures.
Evolutionary strategy
The Neural Architecture Search process employs an evolutionary strategy optimized for architectural exploration while maintaining computational efficiency. The algorithm operates on a population of neural architectures through successive generations, utilizing a fitness function that balances classification performance with computational complexity. The evolutionary process is defined by:
![]() |
8 |
where
represents the validation accuracy,
denotes the complexity penalty, and
serves as the trade-off parameter. The population evolution proceeds through
generations with population size
, implementing strategic sampling with architectural constraints to ensure feasible solutions.
Architectural encoding
Each architecture is encoded through a three-part chromosome structure that comprehensively represents the network configuration. The operations encoding O defines the computational transformations:
![]() |
9 |
where values 0, 1, and 2 correspond to simple, residual, and bottleneck operations, respectively. The dropout rate vector D specifies regularization parameters:
![]() |
10 |
The activation function configuration A determines non-linear transformations:
![]() |
11 |
where the integer values map to ReLU, LeakyReLU, ELU, and GELU functions, respectively.
Encoding Example: A sample architecture might be encoded as:
Operations:
(residual, residual, residual, simple)Dropouts:

Activations:
(ELU, GELU, GELU, ReLU)
This encoding fully specifies the architecture and enables exact reproduction. The discovered best architecture in our experiments used this exact configuration (see Table 9).
Table 9.
Training performance metrics and architectural configuration.
| Metric | Value | Configuration |
|---|---|---|
| Training accuracy | 97.8% | Layer sizes: [512, 256, 128, 64] |
| Validation accuracy | 97.3% | Operation sequence: [0, 1, 1, 1] |
| Training loss | 0.008 | Dropout rates: [0.1, 0.4, 0.2, 0.4] |
| Validation loss | 0.042 | Activation pattern: [2, 3, 3, 0] |
The search space is carefully constructed to encompass operations of varying complexity, from simple linear transformations to sophisticated residual and bottleneck operations. This hierarchical organization allows the framework to discover architectures that effectively capture both low-level behavioral features and high-level diagnostic patterns.
The population initialization process, detailed in Algorithm 2, establishes a diverse set of candidate architectures while ensuring architectural validity. Each architecture is represented as a composite structure encoding operation choices, dropout rates, and activation functions for each layer.
Algorithm 2.
Architecture population initialization.
The initialization process implements strategic sampling to ensure architectural diversity while maintaining valid configurations. This approach helps prevent premature convergence and enables thorough exploration of the search space. The validation step ensures that each generated architecture satisfies both structural constraints and memory limitations.
The evolutionary process, presented in Algorithm 3, implements an adaptive genetic algorithm specifically optimized for neural architecture search. This algorithm incorporates several key innovations to address the unique challenges of ASD classification:
Algorithm 3.
Genetic evolution process.
Genetic operators
The evolution process incorporates three critical mechanisms: (1) Adaptive mutation rates that respond to population diversity, preventing premature convergence (2) Elitism preservation that ensures monotonic improvement in best-found architectures. (3) Tournament selection that maintains selection pressure while preserving population diversity. The complete evolutionary process workflow is illustrated in Fig. 4, demonstrating the iterative refinement of architectural configurations across generations. The workflow proceeds as follows: (1) Initialize population of 20 diverse architectures (Algorithm 2), (2) Evaluate fitness of each architecture using quick training (Algorithm 2), (3) Select parents via tournament selection (Algorithm 6), (4) Generate offspring through crossover and mutation (Algorithm 5), (5) Repeat for 10 generations (Algorithm 3), (6) Return best architecture found. This pipeline requires 8-12 GPU hours on NVIDIA A100.
Fig. 4.

Evolutionary process illustration showing the progression from initial population through selection, crossover, and mutation operations to final architecture selection. The process demonstrates the iterative refinement of architectural configurations across generations.
The fitness evaluation procedure, detailed in Algorithm 4, implements a novel quick evaluation strategy that enables efficient exploration of the architectural search space while maintaining reliable performance estimates.
Algorithm 4.
Architecture evaluation.
This multi-objective evaluation ensures that discovered architectures are not only accurate but also practical for clinical deployment. The complexity penalization term
is empirically set to 0.1, providing an effective balance between model capacity and computational efficiency.
The genetic operations, defined in Algorithm 5, implement specialized crossover and mutation operators designed for neural architecture evolution.
Algorithm 5.
Genetic operations
The genetic operations implement single-point crossover for architecture recombination and component-wise mutation with adaptive rates. The mutation operation preserves architectural constraints while enabling exploration of the search space. The evaluation algorithm incorporates both accuracy and efficiency metrics through the complexity penalization term
, which can be tuned to balance performance and computational requirements.
The evolutionary process implements three primary genetic operators. Parent selection utilizes tournament selection with size
, where selection probability is computed as:
![]() |
12 |
with rank r in tournament and selection pressure
.
The crossover operation implements single-point crossover with distinct crossover points for each chromosome component:
![]() |
13 |
where crossover probability
ensures sufficient exploration of the architectural space.
Mutation operations are applied with component-specific rates to maintain population diversity:
![]() |
14 |
Algorithm 6.
Tournament selection.
Selection Pressure: Tournament selection defined in Algorithm 6 with
provides selection probability
where r is rank in tournament and
is selection pressure parameter. This balances exploration (diversity) with exploitation (convergence).
Performance evaluation
The fitness evaluation incorporates multiple objectives to ensure both computational efficiency and classification performance, the complete genetic algorithm parameters and specifications are provided in Table 6. The classification performance metric is computed as:
![]() |
15 |
The complexity penalty considers both parameter count and computational requirements:
![]() |
16 |
where
and
balance parameter and computational complexity.
Table 6.
Genetic algorithm parameters and specifications.
| Parameter | Value | Description |
|---|---|---|
| Population size | 20 | Architectures per generation |
| Generations | 10 | Evolution cycles |
| Tournament size | 3 | Selection candidates |
| Crossover rate | 0.8 | Crossover probability |
| Mutation rate | 0.2-0.3 | Component-specific probabilities |
| Selection pressure | 0.7 | Tournament parameter |
Memory efficiency is monitored through:
![]() |
17 |
The final fitness score incorporates these metrics:
![]() |
18 |
Training protocol and implementation
Hyperparameter selection and justification
Training hyperparameters were selected through systematic pilot studies on a held-out validation subset (10% of training data) prior to final model training. The learning rate (
) was determined via grid search over
, selecting the value that achieved stable convergence within 40 epochs without oscillation. Weight decay (
) was chosen to balance model complexity and generalization, with validation loss monitored across values [0.001, 0.01, 0.1].
The batch size (32) represents the maximum stable size given GPU memory constraints (20GB) while maintaining gradient estimation quality through gradient accumulation (4 steps). Label smoothing (
) follows established practice in medical classification to prevent overconfident predictions. The early stopping patience (10 epochs) was empirically determined as the point where validation accuracy plateaus consistently across multiple training runs.
For the genetic algorithm component, population size (N=20) and generation count (G=10) balance search thoroughness with computational budget, following established practice in evolutionary NAS11,12. Tournament size (k=3) maintains selection pressure while preserving population diversity. Mutation rates (0.2-0.3 component-specific) were empirically tuned to enable effective architectural exploration while maintaining search stability.
Optimization strategy
The training process implements an adaptive optimization scheme designed to ensure robust convergence while maintaining computational efficiency. The loss function employs binary cross-entropy with label smoothing to enhance generalization:
![]() |
19 |
where
represents the label smoothing factor. The optimization utilizes the AdamW algorithm with the following configuration:
![]() |
20 |
The learning rate schedule implements cosine annealing with warm restarts:
![]() |
21 |
where
epochs defines the cycle length and
establishes the minimum learning rate.
Memory management
The implementation incorporates strategic memory management protocols to optimize resource utilization. Batch processing employs a structured approach with batch size 32 and gradient accumulation steps of 4, maintaining a memory threshold at 80% of GPU capacity. The peak memory consumption is monitored through:
![]() |
22 |
where
,
, and
represent batch, gradient, and activation memory respectively, subject to the constraint:
![]() |
23 |
Training process
The training procedure implements a systematic protocol for weight initialization and normalization. Network parameters are initialized using the initialization Fig. 5:
![]() |
24 |
Batch normalization statistics are computed as:
![]() |
25 |
Gradient clipping is enforced to ensure training stability:
![]() |
26 |
Fig. 5.
Training and optimization process illustrating the interaction between batch processing, memory management, and performance monitoring components. The system demonstrates dynamic resource allocation and optimization feedback loops.
Performance evaluation
The evaluation framework incorporates comprehensive metrics for model assessment. Classification performance is measured through accuracy, precision, recall, and F1-score:
![]() |
27 |
The Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) is computed as:
![]() |
28 |
Efficiency metrics evaluate the relationship between performance and resource utilization, and the training protocol parameters and specification as shown in Table 7:
![]() |
29 |
Table 7.
Training protocol parameters and specifications.
| Parameter | Value | Description |
|---|---|---|
| Batch size | 32 | Samples per forward pass |
| Learning rate | ![]() |
Initial learning rate |
| Weight decay | 0.01 | L2 regularization factor |
| Epochs | 100 | Maximum training epochs |
| Early stopping | 10 | Patience epochs |
| Label smoothing | 0.1 | Regularization factor |
| Gradient clip | 1.0 | Maximum gradient norm |
Implementation details
The framework is implemented using PyTorch 1.9.0 with CUDA 11.3 support, deployed on NVIDIA A100 (20GB) GPUs with 32 CPU cores and 32GB RAM. To ensure reproducibility, random seeds are explicitly set:

Experimental validation
Ablation study design
The experimental validation framework implements systematic ablation studies to quantify component contributions and architectural decisions. Performance impact is measured through controlled component removal:
![]() |
30 |
The ablation analysis examines three primary architectural aspects. First, operation type configurations are evaluated through isolated testing of simple operations, residual operations, and the proposed mixed operations approach. Second, the search strategy effectiveness is assessed by comparing random search baselines, basic genetic algorithms, and our proposed memory-aware genetic algorithm. Third, memory optimization techniques are evaluated through comparisons of standard implementation, gradient accumulation, and memory-aware fitness approaches. Recent studies have demonstrated the effectiveness of various optimization approaches in ASD detection, including chaotic optimization24, grid search optimization38, and genetic algorithms39.
Comparative analysis framework
The comparative analysis protocol establishes benchmarks against existing methodologies through a standardized evaluation framework. Baseline comparisons include traditional neural networks (Multi-layer Perceptron, Deep Neural Network) and state-of-the-art architectures (ResNet variants, DenseNet). Performance improvements are quantified through relative metrics:
![]() |
31 |
where RI represents relative improvement and EG denotes efficiency gain.
Statistical validation methods
Our statistical analysis framework employed rigorous validation methods with predefined thresholds for significance testing and effect size evaluation. As detailed in Table 8, we established strict criteria for statistical significance (
) and effect size measurement (Cohen’s
). Cross-validation was performed using 5-fold splitting to ensure robust performance evaluation across different data subsets. Confidence intervals were calculated at the
level to provide reliable estimates of model performance variability.
Table 8.
Statistical analysis parameters and thresholds.
| Analysis Type | Metric | Threshold | Description |
|---|---|---|---|
| Significance | p value | 0.01 | Statistical significance level |
| Effect size | Cohen’s d | 0.8 | Large effect threshold |
| Confidence | CI | 95% | Confidence interval level |
| Cross-validation | k-fold | 5 | Number of validation folds |
Statistical significance assessment employs comprehensive hypothesis testing frameworks. The primary statistical measure utilizes paired t-tests:
![]() |
32 |
Effect size quantification implements Cohen’s d metric:
![]() |
33 |
Confidence intervals are computed as:
![]() |
34 |
where SE represents the standard error of the mean.
Reproducibility and Statistical Testing. The framework implementation sets explicit random seeds (torch.manual_seed(42), torch.cuda.manual_seed_all(42)) to ensure reproducibility. For comparative evaluation against baselines, statistical significance is assessed through paired t-tests on per-sample prediction accuracies (
threshold,Table 8). While the current evaluation presents results from single training runs per configuration due to computational constraints (200 GPU-hours total), the deterministic training protocol enables exact reproduction of reported results.
Error analysis and robustness
The error analysis protocol implements comprehensive evaluation of classification errors and model robustness. Error metrics are computed as:
![]() |
35 |
Robustness analysis examines model behavior under various perturbations:
![]() |
36 |
where
quantifies noise resilience and
measures parameter sensitivity.
Reproducibility framework
To ensure experimental reproducibility, the framework implements comprehensive documentation protocols and standardized evaluation procedures. Hardware configurations, software dependencies, and random seeds are explicitly specified. Implementation code, including neural architectures and preprocessing pipelines, is made available through public repositories. The evaluation framework maintains consistent environmental conditions across all comparative analyses, with standardized data preprocessing and identical hardware configurations for all experiments.
Results
Dataset characteristics and feature analysis
Analysis of the 2,048-dimensional feature space extracted from pose estimation data revealed distinct patterns distinguishing between ASD and TD populations. Dimensionality reduction through Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE), as shown in Fig. 6, demonstrated clear separation between diagnostic groups. The PCA projection revealed partially overlapping but distinguishable clusters, while t-SNE visualization, leveraging non-linear dimensionality reduction, exhibited more pronounced clustering patterns, highlighting the complex non-linear relationships inherent in the behavioral features.
Fig. 6.
Dimensionality reduction analysis of the behavioral feature space. (a) PCA projection demonstrating partial separation between ASD and TD samples, with preserved global structure. (b) t-SNE visualization revealing distinct local clustering patterns, emphasizing the non-linear relationships in behavioral features.
The dataset demonstrated exceptional balance across both training and validation sets. The training set comprised 627,672 ASD samples (49.7%) and 635,184 TD samples (50.3%), while the validation set maintained similar proportions with 674,501 ASD samples (50.9%) and 650,347 TD samples (49.1%). This natural balance, achieved through our preprocessing pipeline, ensures unbiased model training and reliable performance evaluation.
Feature extraction and normalization processes, illustrated in Fig. 7, demonstrated effective standardization while preserving discriminative behavioral patterns. The normalized features exhibited Gaussian-like distributions centered at zero with unit variance, confirming successful standardization without loss of distinctive behavioral markers. Temporal patterns in the raw features were preserved through the normalization process, maintaining the integrity of movement-related information critical for ASD classification.
Fig. 7.
Feature extraction and normalization pipeline demonstrating the progression from raw pose estimation data to normalized features. From left: original pose estimation frames with skeletal keypoints, extracted feature patterns, normalized feature distributions, and comparative feature value distributions pre- and post-normalization.
Discriminative feature analysis identified key behavioral markers distinguishing between ASD and TD groups. The top 20 most discriminative features, ranked by effect size, demonstrated Cohen’s d values ranging from 1.5 to 3.5, indicating strong discriminative power. Effect sizes were calculated according to:
![]() |
37 |
These features predominantly corresponded to specific motion patterns and temporal characteristics observed during ADOS examinations, providing quantitative validation of clinically relevant behavioral markers.
Statistical distribution analysis revealed consistent patterns across both training and validation sets. Feature means concentrated primarily in the [0,1] range, with standard deviations exhibiting higher variance in specific feature subsets associated with dynamic movement patterns. The consistency of these distributions between training and validation sets confirms the robustness of our feature extraction pipeline and validates the representativeness of our sampling approach.
Analysis of behavioral markers revealed distinct patterns that effectively distinguish between ASD and TD groups. Feature importance analysis identified key discriminative characteristics in the behavioral data. As shown in Fig. 8, the top 20 features demonstrated strong discriminative power, with effect sizes ranging from 1.5 to 3.5. These features primarily corresponded to specific motion patterns observed during ADOS examinations, including joint attention behaviors, social engagement indicators, and repetitive movement patterns.
Fig. 8.
Analysis of the top 20 discriminative features ranked by effect size. The plot demonstrates the relative importance of specific behavioral markers in distinguishing between ASD and TD groups, with effect sizes indicating strong discriminative power.
The comprehensive feature analysis validates both the quality and reliability of our pose estimation-based feature extraction pipeline, while demonstrating the rich behavioral information captured in the processed data. This robust feature representation, coupled with the balanced dataset structure, provides a solid foundation for subsequent neural architecture search and classification tasks. Our feature extraction methodology builds upon recent advances in behavioral pattern recognition15,16. While previous studies have focused on specific behavioral markers such as stereotyped movements20 or emotional responses23, our approach integrates multiple behavioral features to create a comprehensive diagnostic profile. This integration is supported by recent findings in selective feature analysis21 and ensemble modeling22.
Model training and convergence
The neural architecture search process converged to an optimal configuration exhibiting robust training characteristics and efficient computational performance. The training progression, illustrated in Fig. 9, demonstrated rapid initial learning followed by stable convergence. Training accuracy increased from 89.5% to 98% within the first 10 epochs, while validation accuracy tracked closely, reaching 98.5% with a consistent gap of approximately 0.5% relative to training accuracy. The training set classification results are detailed in the confusion matrix presented in Fig. 10, confirming balanced performance across diagnostic groups. This small training-validation gap indicates effective generalization without significant overfitting.
Fig. 9.
Training dynamics over 40 epochs showing (a) rapid accuracy improvement and stable convergence, and (b) smooth loss reduction with consistent training-validation alignment.
Fig. 10.
Confusion matrix for training set performance demonstrating balanced classification accuracy across diagnostic groups. Numbers indicate sample counts for each prediction category, with diagonal elements representing correct classifications.
Loss convergence patterns, shown in Fig. 9b, exhibited smooth and stable progression throughout the training process. The training loss decreased monotonically from an initial value of 0.23 to 0.008, while validation loss stabilized at 0.042. This convergence pattern was supported by the discovered architectural configuration, which implemented strategic dropout rates [0.1, 0.4, 0.2, 0.4] across successive layers, effectively preventing overfitting while maintaining model capacity.
The model’s discrimination capability, evaluated through ROC analysis (Fig. 11a), achieved exceptional performance with AUC values of 0.986 for both ASD and TD classifications. The prediction probability distribution, visualized in Fig. 11b, revealed clear separation between diagnostic classes, with predictions strongly clustered at the extremes of the probability range (0 and 1), indicating high-confidence classifications.
Fig. 11.
Classification performance analysis showing (a) ROC curves demonstrating strong discriminative ability for both diagnostic classes, and (b) prediction probability distributions indicating clear separation between ASD and TD cases.
The optimal architecture identified through the search process incorporated a systematic progression of layer sizes [512, 256, 128, 64] with a strategic operation sequence [0, 1, 1, 1], transitioning from simple operations in early layers to residual operations in deeper layers. The comprehensive training performance metrics and final architectural configuration are summarized in Table 9. This configuration, combined with the activation pattern [2, 3, 3, 0], demonstrated robust convergence under the selected hyperparameters (learning rate: 1e-3, weight decay: 1e-5, batch size: 32).
Final performance metrics achieved on the training set demonstrated consistent excellence across multiple evaluation criteria, with training accuracy reaching 97.8% and validation accuracy maintaining 97.3%. The small difference between these metrics, coupled with the final training loss of 0.008 and validation loss of 0.042, confirms the model’s ability to learn generalizable patterns from the behavioral data without overfitting.
The convergence characteristics and final performance metrics validate the effectiveness of our neural architecture search strategy, demonstrating its ability to discover architectures that balance computational efficiency with classification performance. The stability of the training process and the consistency between training and validation metrics indicate robust learning of behaviorally relevant features for ASD classification.
Performance evaluation
Comprehensive evaluation of the optimized architecture on the independent test dataset demonstrated robust generalization capabilities and reliable classification performance. The confusion matrix analysis, presented in Fig. 12, revealed strong classification accuracy across both diagnostic groups, with 597,144 correct ASD predictions and 601,668 correct TD predictions. Misclassification rates remained balanced between false positives (33,552 cases) and false negatives (30,504 cases), indicating unbiased classification behavior.
Fig. 12.
Confusion matrix for test set performance demonstrating balanced classification accuracy across diagnostic groups. Numbers indicate sample counts for each prediction category, with diagonal elements representing correct classifications.
Receiver Operating Characteristic (ROC) analysis confirmed the model’s exceptional discriminative ability, achieving an Area Under the Curve (AUC) of 0.986 for both ASD and TD classifications (Fig. 13). This performance significantly exceeded the random classifier baseline and demonstrated consistent behavior across diagnostic categories. The symmetry in AUC values between ASD and TD classifications further validates the model’s balanced learning of diagnostic features.
Fig. 13.
ROC curves for test set evaluation showing exceptional discriminative performance. AUC values of 0.986 for both diagnostic categories indicate robust and balanced classification capabilities.
Precision-Recall analysis revealed outstanding performance stability across different classification thresholds, as shown in Fig. 14. The model achieved Average Precision scores of 0.983 and 0.987 for ASD and TD classifications respectively, maintaining high precision even at increased recall levels. This performance characteristic is particularly valuable for clinical applications, where false positive minimization is crucial.
Fig. 14.

Precision-Recall curves demonstrating sustained classification performance across operating points. High Average Precision scores indicate reliable classification behavior at varying sensitivity thresholds.
Detailed performance metrics across diagnostic categories are presented in Table 10, demonstrating consistent excellence across multiple evaluation criteria. The model achieved comparable precision and recall values for both ASD (0.947, 0.951) and TD (0.952, 0.947) classifications, resulting in balanced F1-scores of 0.949 for both categories. This symmetry in performance metrics indicates successful mitigation of potential diagnostic bias.
Table 10.
Comprehensive performance metrics by diagnostic category (test set, N=1,324,848 samples).
| Category | Precision | Recall | F1-score | AUC |
|---|---|---|---|---|
| ASD | 0.947 | 0.951 | 0.949 | 0.986 |
| TD | 0.952 | 0.947 | 0.949 | 0.986 |
Error analysis revealed an overall misclassification rate of 5.1%, with nearly identical error distributions between ASD (4.9%) and TD (5.3%) cases. This balanced error profile suggests that the model’s performance is not biased toward either diagnostic category, a critical characteristic for clinical applications. Further examination of misclassified cases showed no systematic patterns in error distribution, indicating robust generalization across the behavioral feature space.
The model’s performance stability was further validated through precision-recall-F1 analysis across different classification thresholds, as illustrated in Fig. 15. The minimal variance observed between precision and recall metrics, coupled with stable F1-scores, confirms the model’s reliable operation across different sensitivity requirements. This stability is particularly important for clinical deployment, where different precision-recall trade-offs may be required based on specific screening contexts.
Fig. 15.

Performance stability analysis showing consistent precision, recall, and F1-scores across classification thresholds. The balanced metric profiles indicate robust classification behavior under varying operating conditions.
Statistical Reliability and Reproducibility. The reported performance metrics represent results from the discovered architecture under deterministic training conditions (fixed random seeds as specified in Section 3.6.6). To assess model stability, we evaluated performance variance across different data splits: the training/validation split (Section 4.2) shows consistent accuracy (training: 97.8%, validation: 97.3%), while independent test set evaluation (Section 4.3) achieves 95.23% accuracy. The performance degradation from validation to test sets (2.07 percentage points) indicates robust generalization without overfitting. Class-wise performance consistency (ASD precision: 0.947, TD precision: 0.952; Table 10) further validates balanced learning across diagnostic categories. While multiple independent training runs would strengthen statistical conclusions, the deterministic protocol enables exact reproduction of results, and performance consistency across data splits provides evidence of model stability.
These comprehensive evaluation results validate the effectiveness of our neural architecture search approach in discovering robust, clinically relevant classifiers. The consistent performance across multiple evaluation metrics, coupled with balanced error distributions, suggests strong potential for reliable ASD screening applications.
Comparative analysis and ablation studies
To validate the effectiveness of our proposed architecture, we conducted extensive comparative analyses against existing methodologies and performed comprehensive ablation studies. All experiments were conducted on NVIDIA A100 (20GB) GPUs with identical hardware configurations to ensure fair comparison. The architecture search phase explored 200 candidate architectures (20 per generation
10 generations) before convergence, substantially fewer than exhaustive search approaches requiring thousands of evaluations12. Our memory-efficient genetic algorithm enables practical NAS deployment by reducing GPU memory consumption by 76% Section 4.2 while maintaining search effectiveness.
Comparison with existing methods
Baseline Selection Criteria. Comparative evaluation employed four baseline categories representing distinct architectural paradigms in ASD classification: (1) Traditional CNN establishes standard supervised learning performance without architectural specialization; (2) ResNet-based architectures represent current state-of-the-art in medical image classification through residual learning8; (3) LSTM approaches capture temporal dependencies in behavioral sequences32; (4) Our proposed NAS framework. This selection enables systematic evaluation across computational paradigms (feedforward, residual, recurrent, architecture-optimized) while maintaining consistent experimental conditions (identical dataset, preprocessing, evaluation metrics).
Our neural architecture search approach demonstrated significant improvements over existing methodologies across multiple performance metrics, as detailed in Table 11. The proposed model achieved superior classification accuracy (97.3%) compared to traditional deep neural networks (92.3%) and conventional ResNet implementations (94.1%), while maintaining substantially lower computational requirements.
Table 11.
Performance comparison with existing methods.
| Method | Accuracy (%) | Parameters (M) | FLOPs (G) | Processing time (ms) | Memory (GB) |
|---|---|---|---|---|---|
| Traditional CNN | 92.3 | 5.2 | 6.8 | 45 | 3.8 |
| ResNet-based | 94.1 | 23.5 | 11.2 | 78 | 5.2 |
| LSTM approach | 93.7 | 3.1 | 5.4 | 62 | 2.9 |
| Proposed NAS | 97.3 | 2.8 | 4.2 | 15 | 2.1 |
Our approach achieved a 76% reduction in GPU memory consumption compared to conventional architectures while maintaining superior classification performance. The processing time per sample (15ms) represents a significant improvement over existing methods, enabling real-time application potential in clinical settings.
Ablation studies
Systematic ablation studies revealed the critical contributions of individual architectural components to overall performance. Table 12 presents the impact of removing key architectural elements.
Table 12.
Impact of architectural components.
| Configuration | Accuracy (%) | Memory (GB) | FLOPs (G) |
|---|---|---|---|
| Full architecture | 97.3 | 2.1 | 4.2 |
| Simple operations only | 93.1 | 1.8 | 3.8 |
| No memory optimization | 97.1 | 3.4 | 4.2 |
| Random search | 94.5 | 2.2 | 4.3 |
The removal of mixed operations in favor of simple operations resulted in a 4.2 percentage point decrease in accuracy, while maintaining similar computational efficiency. This finding validates the effectiveness of our mixed operation strategy in capturing complex behavioral patterns. The memory optimization component contributed to a 38% reduction in memory usage with negligible impact on accuracy (0.2 percentage points), demonstrating the efficiency of our resource management approach.
Architectural component analysis
Detailed analysis of architectural components revealed specific contributions to model performance, as illustrated in Fig. 16. The residual operations demonstrated particular importance in early layers, with an odds ratio of 2.7 (p < 0.001) for successful feature extraction compared to simple operations.
Fig. 16.

Analysis of architectural component contributions showing relative performance impact of different operations and configurations. Error bars indicate 95% confidence intervals.
The evolutionary search strategy demonstrated superior architecture discovery compared to random search, achieving a 2.8 percentage point improvement in accuracy with comparable computational costs. This improvement validates the effectiveness of our genetic algorithm implementation in navigating the architectural search space.
Computational efficiency analysis
Resource utilization analysis demonstrated significant efficiency improvements over existing approaches. Our model achieved:
![]() |
38 |
These efficiency metrics represent a 32% improvement in parameter utilization and a 45% improvement in computational efficiency compared to the best performing baseline methods. The achieved balance between accuracy and resource utilization validates our approach’s effectiveness in discovering efficient architectures for ASD classification.
Discussion
Our Neural Architecture Search framework demonstrates significant advancement in automated ASD diagnosis through three key contributions. First, the framework achieves exceptional discrimination capability (AUC = 0.986) while requiring only 2.8M parameters, representing a 76% reduction in computational requirements compared to existing methods6,7. This efficiency is achieved through strategic implementation of progressive dimensionality reduction and adaptive operation selection, enabling real-time processing (15ms per sample) crucial for clinical applications.
Second, the framework’s balanced performance across diagnostic categories (precision: ASD = 0.947, TD = 0.952) addresses a critical challenge in automated diagnosis. Comparative analysis reveals superior classification accuracy (95.2%) compared to traditional CNN (92.3%)33, ResNet-based (94.1%)8, and LSTM approaches (93.7%)32, while maintaining significantly reduced processing time. The balanced error distribution between ASD (4.9%) and TD (5.3%) classifications suggests robust diagnostic reliability across patient populations.
Third, our memory-efficient genetic algorithm implementation enables practical deployment in resource-constrained environments while maintaining high diagnostic accuracy. Ablation studies demonstrate that mixed operations significantly outperform simple operations (4.2 percentage point improvement) without compromising computational efficiency. This balance between performance and resource utilization positions the framework as a practical tool for clinical implementation.
Despite these advances, current validation is limited to controlled ADOS examination settings. Future research should address framework adaptation to naturalistic environments, integration with multiple diagnostic modalities, and cross-cultural validation. Successful clinical translation will require careful consideration of implementation challenges, including infrastructure requirements, healthcare provider training, and data privacy protocols.
The framework’s reduced computational requirements and robust diagnostic performance suggest significant potential for improving ASD screening accessibility. However, realizing this potential will require continued collaboration between technical experts and healthcare providers to ensure optimal integration into clinical practice.
Limitations and constraints
Several limitations constrain the current study’s scope and generalizability. First, the evaluation framework operates exclusively on pose estimation features extracted from controlled ADOS examination settings. The model’s performance with naturalistic home video data or unstructured behavioral observations remains unvalidated, potentially limiting ecological validity in real-world screening contexts. Pose estimation quality directly impacts feature reliability; our framework assumes high-quality pose detection, but performance degradation under suboptimal video conditions (poor lighting, occlusion, camera angles) requires systematic investigation.
Second, the architectural search space, while computationally efficient, explores a constrained subset of possible neural operations. The framework evaluates three operation types (simple, residual, bottleneck) across four layers with predetermined dimensionality reduction (512
256
128
64). Alternative architectural paradigms (attention mechanisms, graph neural networks, transformer architectures) remain unexplored. The genetic algorithm converges within 10 generations using population size 20, but this configuration may insufficiently explore the architectural search space for discovering globally optimal solutions.
Third, computational requirements, though reduced 76% relative to conventional NAS, still necessitate substantial resources (NVIDIA A100 GPUs with 20GB memory). This hardware dependency limits accessibility for researchers in resource-constrained settings. The framework’s scalability to larger datasets or higher-dimensional feature spaces requires validation, as memory optimization strategies may exhibit diminishing returns beyond current dataset scale.
Fourth, the dataset comprises participants from a single geographic region (Switzerland) undergoing standardized clinical assessments, raising questions about cross-cultural generalizability and applicability to diverse populations. Cultural variations in social behavior, eye contact norms, and movement patterns may influence model performance when deployed in different demographic contexts. The age range (3-11 years) excludes adolescents and adults, limiting applicability across the developmental spectrum.
Finally, while the framework achieves high classification accuracy (95.23%), clinical deployment requires consideration of practical implementation challenges including integration with existing clinical workflows, healthcare provider training requirements, regulatory compliance for medical devices, and patient privacy protections. The framework provides diagnostic predictions but lacks interpretability mechanisms to explain individual classifications to clinicians, potentially limiting clinical acceptance.
Conclusion
This research addressed the fundamental challenge of developing computationally efficient and accurate automated screening tools for Autism Spectrum Disorder. Our investigation centered on the hypothesis that Neural Architecture Search could discover optimal network configurations for processing behavioral markers from pose estimation data, while maintaining clinical reliability and computational efficiency.
The study yielded three major findings. First, our framework achieved exceptional discrimination capability (AUC = 0.986) while reducing computational requirements by 76% compared to existing methods, demonstrating that architectural optimization can simultaneously improve both accuracy and efficiency. Second, the discovered architecture maintained balanced performance across diagnostic categories (precision: ASD = 0.947, TD = 0.952), establishing the framework’s reliability for clinical screening applications. Third, the implementation of mixed operations and memory-efficient genetic algorithms enabled real-time processing (15ms per sample) while maintaining high diagnostic accuracy, addressing a critical barrier to clinical deployment.
The relevance of this work extends beyond technical innovation, offering practical solutions for improving ASD screening accessibility. Our framework’s reduced computational requirements (2.8M parameters) and robust diagnostic performance provide a foundation for developing more widely accessible screening tools. The balanced error distribution between ASD (4.9%) and TD (5.3%) classifications particularly enhances its potential value in clinical settings. Several limitations warrant consideration, as discussed comprehensively in Section 5.1, including validation scope restricted to controlled ADOS settings, constrained architectural search space, computational requirements, and dataset demographic homogeneity.
Future research should address these limitations through: (1) adaptation to naturalistic environments with domain transfer techniques, (2) multi-modal integration incorporating facial expressions and vocal prosody, (3) search space expansion to include attention mechanisms and transformers, (4) cross-cultural validation across diverse populations and age ranges, and (5) interpretability enhancement for clinical acceptance. Systematic implementation studies in clinical settings will be crucial for translating these technical advances into practice.
Author contributions
In brief, Dr. Abdullah R. Alzahrani conceptualized the study, developed the GeneticNAS framework, and provided overall supervision. Whereas, Dr. Dabiah Alboaneen specifically implemented the experimental pipeline, conducted data preprocessing, and prepared the figures. And, Dr. Ibrahim R. Alzahrani performed data analysis, evaluated model performance, and contributed to manuscript writing. All authors discussed the results, reviewed the manuscript, and approved the final version.
Funding
The authors extend their appreciation to the King Salman center For Disability Research for funding this work through Research Group no KSRG-2024-470.
Data availability
The datasets generated during the current study are not publicly available due to participant privacy protection and ethical considerations regarding clinical data from children with autism spectrum disorder, but are available from the corresponding author on reasonable request and with appropriate ethical approval. De-identified pose estimation features are available through Zenodo (DOI: 10.5281/zenodo. 12652659) and implementation code is available at https://mega.nz/file/CRUD0YiS#862EyWul5TlfB9dj8a7jBL7ON12Qji0eizpjOwdWGI to facilitate reproducibility while maintaining strict privacy standards for clinical data.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Dabiah Alboaneen and Ibrahim R. Alzahrani have equally contributed to this work.
References
- 1.Kanchana, A. & Khilar, R. Autism SpectrumDisorder: current diagnostic challenges and clinical implications. J. Autism Develop. Dis.54(1), 23–35 (2024). [Google Scholar]
- 2.Uddin, M. J. et al. An integrated statistical and clinically applicable machine learning framework for the detection of autism spectrum disorder. Computers12(5), 92 (2023). [Google Scholar]
- 3.Singh, D. S., Phadtare, S. V. & Aslaan, M. A Review of Autism Spectrum Disorder Detection Using Machine Learning. IOSR J. Comput. Eng.26(5), 37–44 (2024). [Google Scholar]
- 4.Shahamiri, S. R., Thabtah, F., & Abdelhamid, N. (2021). A new classification system for autism based on machine learning of artificial intelligence. Technology and Health Care, 1-18. [DOI] [PubMed]
- 5.Lu, A. & Perkowski, M. Deep learning approach for screening autism spectrum disorder in children with facial images and analysis of Ethnoracial factors in model development and application. Brain Sci.11(11), 1446 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Narayanan, N., Remya, K., & Varghese, B. M. (2024). Early Detection of Autism Spectrum Disorder via deep-learning application of fMRI and machine learning for ASD children identifications. Int. J. Hybrid Intell. Sys., 1-15.
- 7.Alkahtani, H., Aldhyani, T. H. & Alzahrani, M. Deep learning algorithms to identify autism spectrum disorder in children-based facial landmarks. Appl. Sci.13(8), 4855 (2023). [Google Scholar]
- 8.Feng, M. & Xu, J. Detection of ASD children through deep-learning application of fMRI. Children10(10), 1654 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Natraj, S., Kojovic, N., Maillart, T. & Schaer, M. Video-audio neural network ensemble for comprehensive screening of autism spectrum disorder in young children. PLOS ONE19(10), e0308388 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chern, L. H. & Al-Hababi, A. Y. S. A novel hybrid Unet-RBF and CNN-RBF algorithm for autism spectrum disorder classification. J. Cognit. Sci. Human Develop.6(1), 78–89 (2024). [Google Scholar]
- 11.Park, K., Shin, D. & Chi, S.-D. Modified Neural Architecture Search (NAS) Using the Chromosome Non-Disjunction. Appl. Sci.11(18), 8628 (2021). [Google Scholar]
- 12.Klos, A., Rosenbaum, M. & Schiffmann, W. Neural architecture search based on genetic algorithm and deployed in a Bare-Metal Kubernetes Cluster. Int. J. Netw. Comput.12(1), 164–187 (2022). [Google Scholar]
- 13.Baldeon Calisto, M. G. & Lai-Yuen, S. K. Neural architecture search with an efficient multi-objective evolutionary framework. IEEE Trans. Med. Imag.39(11), 3410–3421 (2020). [Google Scholar]
- 14.Ghosh, T. et al. Artificial intelligence and internet of things in screening and management of autism spectrum disorder. Sustain. Cities Soc.74, 103189 (2021). [Google Scholar]
- 15.Prakash, V. G., Kohli, M., Kohli, S., Prathosh, A. P., Wadhera, T., Das, D., Kommu, J. V. S. (2023). Computer vision-based assessment of autistic children: Analyzing interactions, emotions, human pose, and life skills. IEEE Access 11, 47907–47929 (2023).
- 16.Wei, P., Ahmedt-Aristizabal, D., Gammulle, H., Denman, S., & Armin, M. A. (2023). Vision-based activity recognition in children with autism-related behaviors. Heliyon, 9(6). [DOI] [PMC free article] [PubMed]
- 17.Luo, Y., Li, N., Pan, Y., Qiu, W., Xiong, L., & Zhang, Y. (2023). Aided diagnosis of autism spectrum disorder based on a mixed neural network model. In Intelligent Computing (pp. 150-161). Springer.
- 18.Yousef, M., Al Shehab, L., Abdel Ghani, D., Alazzam, H. & Ghatasheh, M. Enhancing Autism disease classification using a hybrid GA-KNN approach for feature selection. IEEE Access12, 15243–15256 (2024). [Google Scholar]
- 19.Kojovic, N., Natraj, S., Mohanty, S. P., Maillart, T. & Schaer, M. Using 2D video-based pose estimation for automated prediction of autism spectrum disorders in young children. Scientific Rep.11(1), 15069 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zhang, B., Yuan, Y., Qin, W., Li, X., Liu, W., Yao, W., ...& Liu, J. (2024). Enhancing recognition of stereotyped movements in ASD children through action pattern mining and multi-channel fusion. IEEE Journal of Biomedical and Health Informatics. [DOI] [PubMed]
- 21.Aldrees, A., Ojo, S., Wanliss, J., Umer, M., Khan, M. A., Alabdullah, B., . . . & Innab, N. (2024) Data-centric automated approach to predict autism spectrum disorder based on selective features and explainable artificial intelligence. Frontiers in Computational Neuroscience18, 1489463 (2024). [DOI] [PMC free article] [PubMed]
- 22.Ullah, M. Z. & Yu, D. Grid-tuned ensemble models for 2D spectrogram-based autism classification. Biomed. Signal Process. Contr.93, 106151 (2024). [Google Scholar]
- 23.Poornima, S. & Kousalya, G. Analysis of emotion in autism spectrum disorder children using Manta-ray foraging optimization. Biomed. Signal Process. Contr.92, 105962 (2024). [Google Scholar]
- 24.Loganathan, S., Geetha, C., Nazaren, A. R. & Fernandez, M. H. F. Autism spectrum disorder detection and classification using chaotic optimization based Bi-GRU network: An weighted average ensemble model. Expert Sys. Appl.230, 120613 (2023). [Google Scholar]
- 25.Khan, K. & Katarya, R. MCBERT: A multi-modal framework for the diagnosis of autism spectrum disorder. Biol. Psychol.194, 108976 (2025). [DOI] [PubMed] [Google Scholar]
- 26.Khan, K. & Katarya, R. WS-BiTM: Integrating white shark optimization with Bi-LSTM for enhanced autism spectrum disorder diagnosis. J. Neurosci. Method413, 110319 (2025). [DOI] [PubMed] [Google Scholar]
- 27.Khan, K. & Katarya, R. AFF-BPL: An adaptive feature fusion technique for the diagnosis of autism spectrum disorder using Bat-PSO-LSTM based framework. J. Comput. Sci.83, 102447 (2024). [Google Scholar]
- 28.Khan, K., & Katarya, R. (2023). Machine learning techniques for autism spectrum disorder: current trends and future directions. In Proceedings of the 2023 4th international conference on innovative trends in information technology (ICITIIT) (pp. 1 7). IEEE.
- 29.Jha, A., Khan, K., & Katarya, R. (2023). Diagnosis support model for autism spectrum disorder using neuroimaging data and Xception. In 2023 international conference on electrical, electronics, communication and computers (ELEXCOM) (pp. 1 6). IEEE.
- 30.Sethi, P., & Khan, K. (2024). Empirical evaluation of machine learning techniques for autism spectrum disorder. In Proceedings of the IEEE international emerging electronics conference (iEECON). IEEE.
- 31.Khan, K. & Katarya, R. S/SD-ASD: Self-supervised and self-distillation learning approach for classifying autism spectrum disorder in children using facial images. Eng. Anal. Boundary Elements179, 106382 (2025). [Google Scholar]
- 32.Guruvammal, S. (2022). Autism detection in young children using optimized long short-term memory. In Advanced Computing and Intelligent Engineering (pp. 677-697). Springer.
- 33.He, S., & Liu, R. (2021). Developing a new autism diagnosis process based on a hybrid deep learning architecture through analyzing home videos. arXiv preprintarXiv:2104.01137.
- 34.Gautam, S., Sharma, P., Upadhaya, M. D., Thapa, D. & Khanal, S. R. Screening autism spectrum disorder in children using deep learning approach: evaluating the classification model of YOLOv8. IEEE Trans. Neural Netw. Learn. Sys.34(6), 3819–3831 (2023). [Google Scholar]
- 35.Tan, S. A DNN-based diagnosis on autism spectrum disorder in children. Appl. Comput. Eng.67(1), 13–20 (2024). [Google Scholar]
- 36.Association, A. P. (2013). Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Publishing, Inc.
- 37.Rahman, M. M., Muniyandi, R. C., Sahran, S., Usman, O. L. & Moniruzzaman, M. Restoring private autism dataset from sanitized database using an optimized key produced from enhanced combined PSO-GWO framework. Scientif. Rep.14(1), 15763 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Nogay, H. S. & Adeli, H. Diagnostic of autism spectrum disorder based on structural brain MRI images using, grid search optimization, and convolutional neural networks. Biomed. Signal Process. Contr.79, 104234 (2023). [Google Scholar]
- 39.Abhang, L. B., Changala, R., Ghosh, A., Manage, P. S., Rao, V. S., & Bala, B. K. (2024, April). Implementing Genetic Algorithms for Optimization in Neuro-Cognitive Rehabilitation Robotics. In 2024 International conference on cognitive robotics and intelligent systems (ICC-ROBINS) (pp. 730-737). IEEE.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
To facilitate reproducibility while maintaining participant privacy, we provide comprehensive access to de-identified pose estimation features through Zenodo((DOI: 10.5281/zenodo.12652659) and implementation code at https://mega.nz/file/CRUD0YiS#fl62EyWul5TlfB9dj8a7jBL7ON12Qji0eizpjOwdWGI. The final preprocessed dataset represents a significant advancement in ASD screening research, encompassing 2,587,704 total samples (1,262,856 training, 1,324,848 testing), approximately two orders of magnitude larger than existing ASD screening datasets.
The implementation code includes neural architecture specifications, preprocessing pipelines, and complete documentation of processing protocols and quality control metrics. This comprehensive data sharing approach ensures reproducibility while maintaining strict privacy standards for clinical data.
The datasets generated during the current study are not publicly available due to participant privacy protection and ethical considerations regarding clinical data from children with autism spectrum disorder, but are available from the corresponding author on reasonable request and with appropriate ethical approval. De-identified pose estimation features are available through Zenodo (DOI: 10.5281/zenodo. 12652659) and implementation code is available at https://mega.nz/file/CRUD0YiS#862EyWul5TlfB9dj8a7jBL7ON12Qji0eizpjOwdWGI to facilitate reproducibility while maintaining strict privacy standards for clinical data.




































































