Abstract
This study addresses the efficiency and feature extraction constraints of high-performance Support Vector Machine (SVM) implementations, specifically ThunderSVM, in handling large-scale image datasets. Traditional ThunderSVM heavily relies on manually extracted features, limiting its ability to capture complex, nuanced features critical for robust image recognition. To overcome this, we propose an optimized, hybrid deep learning and machine learning model: ResNet18-ThunderSVM. This integrated architecture utilizes ResNet18 as an automated, powerful feature extractor to overcome the complexity of manual feature engineering, followed by ThunderSVM’s efficient, GPU-accelerated classification to ensure rapid training and inference. The novelty lies in demonstrating that this unique integration optimally balances the superior representational power of deep networks with the computational efficiency of ThunderSVM, a crucial trade-off often encountered in practical deployment. Experimental results on the MNIST dataset indicate that ResNet18-ThunderSVM excels in training efficiency, inference speed, parameter quantity, and performance metrics (precision, recall, F1 score, and accuracy). Compared to both traditional ThunderSVM (using manual features) and a standalone ResNet18 classifier, this hybrid approach not only accelerates training convergence but also significantly improves model generalization and stability, offering an efficient, robust, and resource-conscious solution for complex classification tasks like handwritten digit recognition.
Keywords: Handwriting numeral recognition, ResNet18, ThunderSVM, Efficiency limitation, Deep learning
Subject terms: Engineering, Mathematics and computing
Introduction
Handwritten digit recognition, as a classic issue in the fields of pattern recognition and computer vision, holds theoretical significance1,2 It serves as a typical task for studying core issues such as image classification, feature extraction, and pattern matching, providing a vital experimental platform for the design and optimization of machine learning algorithms3,4. Research on handwritten digit recognition allows for an in-depth exploration of the representational capacity of data features, the generalization performance of models, and the robustness of algorithms under different data distributions, thereby advancing the development of machine learning theory5,6. Additionally, handwritten digit recognition involves techniques such as image preprocessing, noise handling, and data augmentation, whose study is highly relevant to other image recognition tasks7,8. For instance, the successful application of convolutional neural networks (CNNs) in handwritten digit recognition lays the theoretical foundation and technical support for more complex image classification tasks, such as medical image analysis and object detection in autonomous driving9–11.
In practical applications, handwritten digit recognition technology demonstrates extensive value. As the demand for digitization and intelligence continues to grow, the accuracy and efficiency of handwritten digit recognition directly affect the automation level and user experience in related fields12,13. For example, in the financial sector, efficient handwritten digit recognition technology can significantly enhance the efficiency of check processing while reducing manual intervention. In the education field, automated grading systems can greatly alleviate teachers’ workloads and improve the objectivity and consistency of assessments14–16. Additionally, handwritten digit recognition is widely applied in scenarios such as handwritten postal code recognition in postal systems, handwritten input on smart devices, and document digitization processes17,18.
Traditional methods for handwritten digit recognition primarily rely on manual feature extraction and shallow machine learning models, such as support vector machines (SVM), k-nearest neighbors (KNN), and random forests19,20. While these methods perform well on small datasets, they often face challenges related to insufficient feature extraction capabilities and limited generalization performance when dealing with large-scale, high-dimensional data21–23. In recent years, the rapid development of deep learning technologies has provided new solutions for handwritten digit recognition. Deep learning models, such as CNN, can automatically extract multi-level features, significantly enhancing recognition accuracy and robustness24–26. However, deep learning models typically require substantial computational resources and training time, which somewhat limits their widespread adoption in practical applications27–29. To balance efficiency and performance, researchers have begun exploring hybrid approaches that combine deep learning with traditional machine learning methods. Among these, ThunderSVM is an optimized support vector machine (SVM) implementation designed to enhance SVM training and inference efficiency, garnering widespread attention due to its efficient computational performance and strong classification capabilities30,31. It supports various SVM variants, such as classification (C-SVC), regression (ε-SVR), and one-class SVM, and significantly boosts computational performance by leveraging graphics processing unit (GPU) acceleration (compatible with NVIDIA CUDA and OpenCL)30.
Despite the significant advantages of ThunderSVM in computational performance, its SVM-based nature necessitates substantial computation and optimization during the training process, resulting in longer training times30,32. This is particularly evident when handling large-scale datasets, where the training speed of ThunderSVM often fails to meet practical application demands33,34. Furthermore, as a traditional machine learning model, ThunderSVM lacks the end-to-end training capability of deep learning models, relying on manual feature extraction and preprocessing steps. This not only increases modeling complexity but also limits its adaptability to new tasks and datasets35. Chen et al. (2023)36 indicated that the performance of ThunderSVM was highly dependent on manually designed features, making it difficult to capture the complex and subtle characteristics present in image data. To overcome these limitations, this study proposes combining ThunderSVM with the deep learning model ResNet18. As an efficient convolutional neural network, ResNet18 can automatically extract multi-level features through its residual learning mechanism, significantly enhancing feature extraction capabilities37–39. By integrating the feature extraction ability of ResNet18 with the efficient classification capability of ThunderSVM, the ResNet18-ThunderSVM model not only improves the accuracy of handwritten digit recognition but also significantly enhances training and inference efficiency, thereby demonstrating greater practical value in real-time applications.
Handwritten character images are typically represented by high-dimensional data, which makes conventional ThunderSVM highly vulnerable to the “curse of dimensionality,” resulting in substantial computational costs and potential overfitting40. In contrast, ResNet18 effectively reduces input data dimensionality through its unique convolution and pooling operations41,42. For instance, ResNet18’s convolutional layers employ kernels of various sizes to scan images and extract local features, while pooling layers compress data, reducing data volume and retaining essential information43. This process enhances classification performance and alleviates ThunderSVM’s challenges with high-dimensional data. Handwritten characters exhibit significant variability, encompassing diverse fonts, handwriting styles, tilts, and distortions44. ResNet18’s multi-level feature learning mechanism can gradually capture and understand feature information across these variations, enhancing the model’s robustness against such changes45,46. This robustness improvement further boosts ThunderSVM’s classification performance, enabling the combined model to consistently deliver stable and accurate results across varied handwritten characters.
Aiming at the problem of cervical cell classification, Muksimova et al.47 improved the adaptability and robustness of the model through reinforcement learning. This study draws on this method, combined with the hybrid model of ResNet18 for feature extraction and ThunderSVM for efficient classification, which further improves the accuracy and practicability of handwritten digit recognition. The integration of ResNet18 and ThunderSVM is achieved as follows: first, ResNet18 extracts features from input handwritten character images, with its residual modules progressively obtaining rich and representative feature representations layer by layer48. These features are then passed to ThunderSVM for classification49. This integration greatly simplifies feature design and selection, as ResNet18 autonomously learns and isolates critical features, thus increasing the model’s overall training and inference efficiency. It avoids the complexity and unpredictability associated with manual feature extraction in traditional methods50. Conventional ThunderSVM tends to underperform with new samples, particularly when sample size is limited. By integrating ResNet18, ThunderSVM significantly improves generalization through its regularization methods and ResNet18’s deep learning strengths. This combination enhances the model’s adaptability to diverse types and scenarios of handwritten samples, enabling more accurate classification, even for unfamiliar handwritten characters.
The main innovation and contribution of this study is to prove that the ResNet18-ThunderSVM combination provides a novel and excellent solution, which is embodied in:
The automatic feature extraction of ResNet18 completely replaces the artificial feature engineering in the traditional method, which greatly enhances the adaptability and processing ability of the model to high-dimensional image data.
The trade-off between efficiency and performance is optimized: by combining the robust feature ability of deep learning and the efficient classification speed of ThunderSVM, the ensemble achieves faster convergence speed and efficiency than the single deep network, and has higher accuracy than ThunderSVM using artificial features.
A resource-friendly alternative is provided: compared to training larger, deeper end-to-end CNNs, this hybrid structure is more efficient in terms of training time and potential resource consumption, making it more feasible in real-time and edge applications.
The follow-up content of this paper is organized as follows: The second section reviews the related work. In the third section, the architecture of ResNet18-ThunderSVM model is described in detail, and the selection method of key parameters is explained. The fourth section is experimental design and result analysis, which will focus on in-depth discussion of training time efficiency, comparison of different kernel functions, performance comparison with the latest state-of-the-art (SOTA) models, and versatility testing on more challenging datasets. At the same time, the experimental setup section will clearly list the hardware resource information for training and testing to improve reproducibility. Finally, the fifth section gives the conclusion.
Related works
Handwritten digit recognition has long been regarded as a fundamental research topic in pattern recognition and computer vision, and extensive studies have been conducted over the past decades. Early research mainly focused on traditional machine learning methods, such as support vector machines (SVM), k-nearest neighbors (KNN), and random forests51–53. These approaches typically rely on handcrafted feature extraction techniques, including statistical descriptors and structural features. Experimental results reported in the literature indicate that such methods can achieve relatively high accuracy on benchmark datasets like MNIST when the dataset scale is limited and the feature design is carefully optimized52. However, existing studies consistently conclude that the performance of these methods is highly sensitive to feature selection and parameter tuning. Moreover, handcrafted features often fail to capture complex spatial patterns and subtle variations in handwriting styles, resulting in degraded robustness under noise, distortion, or data distribution shifts54. These limitations become more pronounced when dealing with large-scale or high-dimensional image data, where traditional methods exhibit restricted generalization capability and scalability.
With the rapid advancement of deep learning, convolutional neural networks (CNNs) have been widely adopted for handwritten digit recognition21,53,55–57. Prior studies demonstrate that deep learning models can automatically learn hierarchical and discriminative feature representations directly from raw image inputs, leading to substantial improvements in recognition accuracy and robustness compared with traditional methods. For instance, deep CNN-based approaches consistently achieve state-of-the-art performance on standard benchmarks by effectively modeling complex intra-class variations. Nevertheless, the conclusions drawn in these works also reveal significant challenges. Deep learning models generally require large-scale labeled datasets, extensive computational resources, and long training times to reach optimal performance27–29. Such requirements pose serious obstacles for real-time applications and deployment in resource-constrained environments. Therefore, despite their superior accuracy, pure deep learning approaches still suffer from efficiency and practicality issues.
To address computational inefficiency, optimized SVM implementations such as ThunderSVM have been proposed30,31,58. Existing research shows that ThunderSVM significantly accelerates training and inference by exploiting GPU parallelism, making it suitable for large-scale classification tasks. Experimental evaluations demonstrate that ThunderSVM achieves comparable classification accuracy to traditional SVMs while offering substantial speed improvements. However, prior studies also point out that ThunderSVM inherently inherits the limitations of classical SVM-based methods. Its performance remains heavily dependent on manually engineered features and preprocessing steps32,35,59. When applied to high-dimensional image data, the training process still involves intensive kernel computations and parameter optimization, which limits further efficiency gains and increases modeling complexity30,32,60. Consequently, while ThunderSVM improves computational speed, it does not fundamentally resolve the feature representation bottleneck in image-based recognition tasks.
In response to these challenges, recent studies have explored hybrid approaches that combine deep learning with traditional machine learning models. Some works focus on improving computational efficiency through feature selection, dimensionality reduction, or parallel and incremental learning strategies61–65. Others adopt deep neural networks, such as ResNet18, as feature extractors, leveraging residual learning mechanisms to alleviate gradient vanishing and enhance feature representation capability66–68. These studies conclude that deep feature extraction can significantly reduce dependence on handcrafted features and improve classification robustness. However, most existing research concentrates on optimizing individual components or specific applications, and lacks systematic quantitative comparisons that jointly evaluate recognition accuracy, training efficiency, and computational cost.
Based on the above analysis, it can be observed that existing methods have not fully achieved an optimal balance between feature representation quality and classification efficiency. The ResNet18–ThunderSVM hybrid model proposed in this paper is designed to address this gap. By integrating ResNet18 for automatic deep feature extraction with ThunderSVM69,70 for efficient GPU-accelerated classification, the proposed approach effectively combines the strengths of deep learning and traditional machine learning. Unlike conventional ThunderSVM-based methods, the proposed model eliminates the reliance on manual feature engineering, while compared with pure deep learning models, it significantly reduces training and inference costs. Furthermore, this study provides a systematic comparison with state-of-the-art methods, clearly demonstrating the performance advantages and practical value of the proposed hybrid architecture. As a result, the ResNet18–ThunderSVM model not only advances handwritten digit recognition but also offers a generalizable solution for efficiency-oriented pattern recognition tasks.
Methodology
Hybrid architecture and deep feature extraction
This study proposes a cascaded recognition algorithm based on the ResNet18-ThunderSVM framework (Fig 1). The core logic involves leveraging the robust non-linear feature mapping capabilities of a deep residual network to project raw high-dimensional pixel data into a latent feature space with high inter-class separability, followed by classification using ThunderSVM to achieve optimal generalization performance.
Fig. 1.
Model operation framework.
To validate the robustness of the model, four benchmark datasets are employed: MNIST, EMNIST, USPS, and Fashion-MNIST. All raw input images
are resized to a single-channel
resolution. To eliminate variations in pixel intensity and accelerate network convergence, images are preprocessed using the following standardization formula:
![]() |
1 |
where
represents the normalized image,
is the original image,
denotes the sample mean, and
represents the sample standard deviation. This step ensures that input data is distributed within the
range, optimizing the efficiency of gradient-based learning.
A ResNet18 backbone, pre-trained on the ImageNet data set, is utilized as the primary feature extractor.
Structural Modification: To accommodate single-channel grayscale inputs, the first convolutional layer is modified to have 1 input channel instead of 3. The original configuration of 64 convolutional kernels, a stride of 2, and padding of 3 is strictly maintained to preserve low-level edge detection features. The convolution operation is represented by formula:
![]() |
2 |
where
denotes the ReLU activation function,
is the convolutional kernel, and
is the bias term.
Parameter Strategy: Addressing the concern regarding model transparency, the parameters of ResNet18 are set to fine-tuning mode rather than being frozen. This allows the network to adapt its pre-trained weights through backpropagation to capture task-specific textures.
Residual Learning Mapping: The architecture utilizes residual connections to address the vanishing gradient problem in deep networks. The output of each residual block,
, is defined by formula:
![]() |
3 |
where
is the input feature and
represents the non-linear residual mapping.
After passing through four sequential residual stages and a Global Average Pooling (GAP) layer, the model generates a 512-dimensional high-dimensional feature vector
. This vector is subsequently mapped via a fully connected layer to a 10-dimensional space corresponding to the class labels:
![]() |
4 |
where
is the weight matrix and
is the bias term. To ensure the stability of the ThunderSVM kernel space calculation, the feature vectors undergo L2 Normalization before being fed into the classifier.
ThunderSVM classification and kernel optimization
ThunderSVM is implemented to solve the quadratic optimization problem in the feature space by utilizing GPU acceleration.
The training process involves finding the optimal separating hyperplane by solving the following dual problem with slack variables:
![]() |
5 |
subject to the constraints:
![]() |
6 |
where
represents the weight vector,
is the penalty parameter,
denotes the slack variable, and
represents the 512-dimensional input features.
This study systematically compares three kernel functions
to evaluate classification efficacy:
Linear kernel: Defaults to kernel_type=‘linear’ to simplify model complexity as a baseline.
RBF (Radial Basis Function) Kernel: Utilized to compute pairwise distances between features:
![]() |
7 |
where the parameter
(related to the standard deviation
) is optimized through the model training process.
Polynomial kernel: Computes polynomial combinations of features:
![]() |
8 |
To ensure a fair comparison, parameters are strictly set to degree
and constant term
.
To ensure reproducibility and rigor, the model employs 5-fold Cross-validation combined with Grid Search for parameter selection:
Search space: The penalty parameter
is searched within
, and the kernel parameter
within
.
Convergence criteria: The stopping tolerance (
) is set to
, and the maximum number of iterations (
) is set to −1 (unlimited, until convergence).
Prediction logic: The final classification result is determined by the maximum class score:
![]() |
9 |
where
represents the scores calculated as
.
Experimental design and results analysis
Experimental configuration and hyper-parameter settings
To ensure the reproducibility of the experiments and address the heavy reliance of ThunderSVM on GPU computing environments, the hardware, software, and hyper-parameter configurations are consolidated in Table 1. This version specifically includes the GPU driver and CUDA versions, while introducing the AdaBoB optimizer to evaluate the model’s stability and noise sensitivity.
Table 1.
Integrated experimental environment and hyper-parameter rationale.
| Category | Parameter/Detail | Value/Specification | Rationale and purpose |
|---|---|---|---|
| Hardware | GPU model | NVIDIA GeForce RTX 3090 | Core resource for ResNet18 and ThunderSVM acceleration. |
| GPU VRAM | 24GB | Supports batch size=64 and handles deep network parameters. | |
| CPU specs | Intel Core i9-10900K | Critical for data loading and training I/O efficiency. | |
| Software | GPU driver/CUDA | Driver 525.85/CUDA 11.7 | Ensures thunderSVM GPU acceleration performance and efficiency. |
| Frameworks | torch=1.12, torchaudio=0.12 | Ensures software-level reproducibility. | |
| Optimizers | Adam & AdaBoB | LR=0.001 | Adaptive learning rate; AdaBoB evaluates performance under gradient confidence. |
| Parameters | Batch size | 64 | Balance between memory usage and training efficiency. |
| Epochs | 20 | Based on loss curve convergence to avoid overfitting. | |
| SVM Kernel | Linear (Default) | Reduces computational cost in high-dimensional feature space. | |
| SVM penalty (C) | 10 | Balanced via cross-validation {1, 10, 100}. | |
| Robustness | Noise level ( ) |
{0.1} software versions. | Measurement of model robustness against handwritten digit noise. |
Mathematical foundation and optimization
The model parameters are updated using the Adam optimizer as described in Equation:
![]() |
10 |
Here,
represents model parameters,
is the learning rate,
is the first moment estimate, and
is the second moment estimate. To address potential convergence biases, we further introduce the AdaBoB optimizer, which combines the gradient confidence of AdaBelief and the dynamic boundaries of AdaBound to enhance robustness against noise.
The cross-entropy loss (
), suitable for multi-class classification, is calculated as follows:
![]() |
11 |
Where N denotes the number of samples, C is the number of classes,
represents the true labels, and
denotes the predicted probabilities.
Introduction of evaluation index
To comprehensively evaluate the proposed ResNet18-ThunderSVM model, we employ a multi-dimensional assessment framework encompassing classification performance, operational efficiency, and structural complexity.
During the training process, Precision, Recall, F1-score, and Accuracy are recorded and calculated as evaluation metrics to comprehensively assess model performance. The calculation methods are described in Equations:
![]() |
12 |
![]() |
13 |
![]() |
14 |
![]() |
15 |
where TP, TN, FP, and FN represent True Positives, True Negatives, False Positives, and False Negatives, respectively.
In addition to accuracy, the model’s suitability for deployment is evaluated through:
Total parameters: This metric quantifies the model’s spatial complexity, representing the total number of learnable weights in the ResNet18 backbone. A lower parameter count indicates a more lightweight architecture.
FPS (Frames Per Second): This represents the number of images processed per second during inference. It is the primary indicator of the model’s real-time processing speed and throughput.
ResNet extraction & thunderSVM latency: We separate the feature extraction time (CNN) from the classification time (GPU-SVM) to pinpoint potential bottlenecks in the pipeline.
We introduce the parameter quantity shift-fitting performance (PQS-FP)72,73 coordinate system to analyze how parameter scaling regulates the model’s fitting state. This analysis theoretically demonstrates that the ResNet18-ThunderSVM architecture achieves an optimal equilibrium between structural complexity and recognition accuracy for specific handwritten digit recognition tasks.
Dataset
This study utilizes four distinct datasets—MNIST, EMNIST, USPS, and Fashion-MNIST—for model experimentation and evaluation(Fig. 2). These datasets present diverse challenges across tasks such as handwritten digit recognition, character classification, domain shift analysis, and fashion item identification, providing a comprehensive framework to evaluate the performance of the proposed model.
Fig. 2.
Visualization of dataset samples.
MNIST (Modified National Institute of Standards and Technology) is one of the most widely used benchmark datasets in computer vision and machine learning, consisting of grayscale images of handwritten digits. It contains 60,000 training images and 10,000 testing images, each with a resolution of 28×28 pixels. The samples encompass ten categories (0–9) and reflect various writing styles, simulating real-world handwritten digit recognition scenarios and providing a robust foundation for model training and assessment.
EMNIST (Extended MNIST) is an extension of the MNIST dataset designed to increase the difficulty of character recognition tasks by including handwritten letters. EMNIST comprises multiple subsets; this study specifically utilizes the subset containing handwritten uppercase letters across 26 categories. With over 800,000 images divided into training and testing sets, EMNIST offers greater diversity and significant challenges, making it suitable for evaluating models in more complex character recognition tasks.
USPS (United States Postal Service) consists of handwritten digit images provided by the U.S. Postal Service. While similar to MNIST in its focus on digits, the USPS dataset features distinct writing styles and varying levels of noise, with images formatted as 16×16 pixel grayscale blocks. It contains 7,291 training images and 2,007 testing images. The USPS dataset is specifically employed to test the model’s domain shift capabilities—measuring how well a model trained on one distribution (e.g., MNIST) generalizes to a different but related distribution (USPS).
Fashion-MNIST serves as a more challenging alternative to MNIST for evaluating image classification models. It consists of various fashion-related items, such as apparel and accessories. The dataset includes 60,000 training images and 10,000 testing images, each at 28×28 pixels, categorized into 10 classes (e.g., T-shirts, trousers, dresses, and sneakers). Compared to the original MNIST, Fashion-MNIST images are more complex, featuring diverse textures and shapes, which is essential for assessing the model’s performance on tasks with higher visual complexity.
By leveraging these four datasets, this research comprehensively evaluates the proposed model across various tasks and challenges. The diversity and uniqueness of each dataset help demonstrate the model’s adaptability and generalization capabilities in different recognition scenarios (Table 2).
Table 2.
Experimental results on mnist dataset.
| Models /Metrics |
Time /Seconds |
Param /KB |
FPS | Precision /% |
Recall /% |
F1 score /% |
Accuracy /% |
|---|---|---|---|---|---|---|---|
| Decision tree | 176.95 | 0.235 | 10891.02 | 59.46 | 59.36 | 59.04 | 59.92 |
| Random forest | 906.63 | 76.660 | 829.47 | 84.20 | 83.31 | 83.27 | 83.71 |
| SVM | 73.81 | 0.497 | 176.26 | 84.11 | 83.26 | 83.19 | 83.50 |
| ThunderSVM | 133.61 | 7.666 | 13207.08 | 92.01 | 91.84 | 91.88 | 91.96 |
| ViT-TS | 1427.47 | 205578 | 7946.45 | 98.38 | 98.36 | 98.37 | 98.38 |
| LeNet-TS | 232.91 | 44426 | 9161.74 | 98.63 | 98.62 | 98.62 | 98.63 |
| CNN | 170.60 | 411.947 | 12379.15 | 98.86 | 98.85 | 98.86 | 98.86 |
| Inceptionv4-TS | 318.76 | 28802 | 9785.23 | 98.95 | 98.90 | 98.92 | 98.93 |
| MobileNetv3-TS | 1493.20 | 2911.399 | 6596.21 | 99.25 | 99.26 | 99.25 | 99.26 |
| CNN-TS | 192.22 | 102.207 | 12448.14 | 99.27 | 99.26 | 99.26 | 99.27 |
| VGG16-TS | 624.52 | 135118.67 | 9850.10 | 99.32 | 99.32 | 99.32 | 99.32 |
| ResNet18-TS(Adam) | 742.84 | 10913.447 | 8666.06 | 99.34 | 99.33 | 99.34 | 99.34 |
| ResNet18-TS(AdaBoB) | 718.12 | 10913.447 | 8921.15 | 99.35 | 99.34 | 99.36 | 99.36 |
Comparative experiment and comprehensive performance evaluation
Through these comparative experiments, this paper will verify the superiority of the proposed ResNet18-ThunderSVM model in dealing with handwritten digit recognition tasks, and the effectiveness and practicability of the optimization method. Experiments were conducted on multiple datasets including MNIST, EMNIST (Letters), USPS, and Fashion-MNIST to evaluate generalization and domain adaptation capabilities. The optimized ResNet18-ThunderSVM model was compared with three traditional classification algorithms and several deep learning-based models (Decision Tree; Random Forest; SVM; ThunderSVM; ViT-ThunderSVM; LeNet-ThunderSVM; CNN; Inceptionv4-ThunderSVM; MobileNetv3-ThunderSVM; CNN-ThunderSVM; VGG16-ThunderSVM). Furthermore, to assess noise robustness, Gaussian noise (σ=0.1) was added to a subset of each dataset. All models were evaluated under two optimization settings: the widely-used Adam optimizer and the recently proposed AdaBoB optimizer71, which integrates gradient confidence and dynamic learning rate bounding to mitigate convergence and noise sensitivity issues. The experimental results on MNIST are shown in Table 3, with extended results on other datasets and robustness metrics provided in subsequent sections.
Table 3.
Cross-dataset performance and noise robustness.
| Data set | Model | Clean Acc (%) | Noisy Acc (%) | ΔAcc (%) |
|---|---|---|---|---|
| EMNIST | ResNet18-TS | 94.28 | 92.17 | −2.11 |
| CNN-TS | 93.85 | 91.42 | −2.43 | |
| ThunderSVM | 86.34 | 82.91 | −3.43 | |
| USPS | ResNet18-TS | 96.45 | 94.88 | −1.57 |
| CNN-TS | 95.92 | 93.67 | −2.25 | |
| ThunderSVM | 89.12 | 85.34 | −3.78 | |
| Fashion-MNIST | ResNet18-TS | 93.67 | 91.25 | −2.42 |
| CNN-TS | 92.88 | 90.11 | −2.77 | |
| ThunderSVM | 84.56 | 80.23 | −4.33 |
The table 2 presents experimental results for various machine learning models evaluated on the MNIST dataset. In terms of time, the Decision Tree model is the fastest, taking only 176.95 seconds, while MobileNetv3-TS is the slowest at 1493.20 seconds. Regarding parameter sizes, Decision Tree and SVM models have relatively small parameters (under 1 MB), while models such as ViT-TS and LeNet-TS have much larger parameter sizes, with ViT-TS being the largest at 205578 KB. FPS performance varies significantly, with Decision Tree having the lowest FPS at 10891.02, and models like MobileNetv3-TS and ViT-TS reaching higher FPS (6596.21 and 7946.45, respectively). When looking at precision, recall, F1 score, and accuracy, deep learning models like ViT-TS, LeNet-TS, and CNN consistently outperform traditional models such as Decision Tree, SVM, and Random Forest. For example, ViT-TS achieves a precision of 98.38%, recall of 98.36%, F1 score of 98.37%, and accuracy of 98.38%. On the other hand, Decision Tree performs poorly, with accuracy only reaching 59.92%. Deep learning models like ResNet18-TS (AdaBoB) balance performance and parameter size well, with accuracy at 99.36%. In summary, deep learning models, especially CNN-based models and ResNet18-TS, perform significantly better on the MNIST dataset, showing high precision, recall, F1 scores, and accuracy, while traditional models lag behind in performance, despite quicker processing times.
The figure 3 provides a comprehensive comparison of various machine learning models on the MNIST dataset, evaluating them across four key performance metrics: Accuracy, F1 Score, Recall, and Precision. The deep learning models, especially ViT-TS, LeNet-TS, CNN, and ResNet18-TS (AdaBoB), show significantly better performance than traditional machine learning models like Decision Tree (DT) and Random Forest (RF). These deep learning models consistently achieve high values across all metrics, with accuracy, recall, F1 score, and precision all above 98%. In contrast, the traditional models like DT and RF show much lower performance, particularly in terms of accuracy and recall, with values falling below 80%. SVM and ThunderSVM offer some improvements but still lag behind the deep learning models.
Fig. 3.

MNIST dataset comparison experimental results dot plot.
In order to evaluate the generalization ability of ResNet18-ThunderSVM, the experiment is extended to EMNIST, USPS and Fashion-MNIST datasets. USPS represents domain-offset scenes with different image distribution features, while Fashion-MNIST introduces more complex visual patterns. In addition, Gaussian noise (σ = 0.1) is added to 20% of the data in each test set to measure noise robustness. Table 3 summarizes the accuracy and performance degradation (
Acc) under clean and noisy conditions.
ResNet18-ThunderSVM consistently outperforms baseline models across all datasets and exhibits smaller performance degradation under noise, demonstrating stronger generalization and robustness.
In addition to accuracy, we measured key efficiency metrics: ResNet18 feature extraction time per epoch, ThunderSVM training latency, and peak GPU memory usage during training on MNIST. Results are listed in Table 4.
Table 4.
Cross-dataset performance and noise robustness.
| Model | ResNet18 extract time/epoch (s) | ThunderSVM train latency (s) | Peak GPU memory (MB) |
|---|---|---|---|
| ResNet18-TS (Adam) | 42.6 | 28.3 | 1245 |
| ResNet18-TS (AdaBoB) | 41.8 | 27.9 | 1238 |
| CNN-TS | 18.4 | 22.1 | 892 |
| VGG16-TS | 67.3 | 34.7 | 2845 |
ResNet18-ThunderSVM offers a balanced trade-off: while extraction time is higher than lightweight CNN, it remains more efficient than VGG16 and achieves superior accuracy with moderate memory footprint.
The confusion matrix for ResNet18-ThunderSVM in Figure 4(l) demonstrates significant improvements in classification accuracy compared to other models. The high diagonal values indicate strong performance in correctly classifying digits. For example, compared to the traditional ThunderSVM in Figure 4(d), ResNet18-ThunderSVM shows generally higher diagonal values, with the accuracy of classifying digit “1” improving from 98.1% to 99.4%. Similarly, when compared to models such as Decision Tree (Fig. 4(a)), Random Forest (Fig. 4(b)), SVM (Fig. 4(c)), and other hybrid approaches like ViT-ThunderSVM (Fig. 4(e)), LeNet-ThunderSVM (Fig. 4(f)), CNN (Fig. 4(g)), Inceptionv4-ThunderSVM (Fig. 4(h)), MobileNetv3-ThunderSVM (Fig. 4(i)), CNN-ThunderSVM (Fig. 4(j)), and VGG16-ThunderSVM (Fig. 4(k)), ResNet18-ThunderSVM consistently exhibits enhanced recognition capabilities. Additionally, the non-diagonal elements in ResNet18-ThunderSVM’s confusion matrix are lower, suggesting fewer misclassifications. For instance, the error rate of misclassifying “8” as “3” decreases from 4.9% in ThunderSVM to 1.1% in ResNet18-ThunderSVM. The diagonal values in ResNet18-ThunderSVM are also more consistent across categories, reflecting balanced performance, in contrast to the variable performance observed in models like ThunderSVM, especially its lower accuracy for recognizing digit “4”.
Fig. 4.
Comparison of Confusion Matrices. (a) Decision Tree; (b) Random Forest; (c) SVM; (d) ThunderSVM; (e) ViT-ThunderSVM; (f) LeNet-ThunderSVM; (g) CNN; (h) Inceptionv4-ThunderSVM; (i) MobileNetv3-ThunderSVM; (j) CNN-ThunderSVM; (k) VGG16-ThunderSVM; (l) ResNet18-ThunderSVM.
Figure 5 shows the relationship between model complexity (number of parameters, in KB) and performance (accuracy, in percentage). It can be seen from the figure that the model is represented by blue points, and the position of each point reflects the number and accuracy of their parameters. The CNN model has the least number of parameters and the lowest accuracy rate, about 98.9%. The accuracy of MobileNetV3-TS and CNN-TS is close to 99%, but their number of parameters is slightly higher than that of CNN; the number of parameters of InceptionV4-TS is about 25,000 KB, and the accuracy is about 99.2%. ResNet18-TS (AdaBoB) and ResNet18-TS (Adam) are the most accurate models, about 99.3%, but their number of parameters is also relatively high. The graph intuitively compares the trade-off between accuracy and complexity (number of parameters) of different models.
Fig. 5.
PQS-FP Plot: Model complexity vs performance.
Fig. 6 compares correctly and incorrectly classified handwritten digits from the MNIST dataset. The top section (a) displays images where the model’s predicted labels match the true labels, and the digits are clear and easy to recognize. The bottom section (b) shows images where the model’s predictions do not match the true labels, indicating misclassifications. The misclassified digits tend to be less clear, which may explain why the model failed to recognize them correctly.
Fig. 6.
Correctly vs. incorrectly classified handwritten digits.
Discussion
The successful application of the ResNet18-ThunderSVM model to the handwritten digit recognition task demonstrates its significant potential for future optimization and widespread application. While the current model shows excellent performance in terms of accuracy, efficiency, and robustness, several areas remain open for improvement.
First, to directly address the efficiency trade-off, future work should explore the computational costs associated with ResNet18-ThunderSVM. A detailed discussion on the increased training time of ResNet18-ThunderSVM compared to standalone ThunderSVM, along with the trade-off between training time and performance gains, is necessary. Methods to reduce training costs, such as model pruning or adopting lighter ResNet architectures (e.g., ResNet18-Lite), should be considered to enhance the model’s suitability for resource-constrained environments.
Second, to validate the model’s generalizability and robustness, performance should be evaluated on more diverse and challenging datasets, such as EMNIST and USPS. These datasets will help assess the model’s applicability and robustness across different domains. Moreover, future comparative studies should include advanced deep learning architectures, such as Transformer-based models or hybrid networks, potentially incorporating attention mechanisms, to further enhance feature extraction and performance, ensuring that the research remains aligned with state-of-the-art methods.
The ResNet18-ThunderSVM model is not limited to handwritten digit recognition; it has broad application prospects in various fields such as medical imaging (e.g., tumor detection), autonomous driving (e.g., object detection and scene understanding), intelligent surveillance, and natural language processing. Future work could explore integrating ResNet18-ThunderSVM with emerging technologies like federated learning for decentralized and privacy-preserving model training. Furthermore, exploring the model’s performance on multimodal data (such as combining visual and textual information) could open new avenues for cross-domain applications.
Conclusion
This research investigates the efficiency limitations of traditional ThunderSVM in image data processing, particularly its reliance on manual feature extraction and its inability to capture complex, subtle features in images. To overcome these challenges, we propose an optimized ResNet18-ThunderSVM model, successfully combining the deep learning model ResNet18 with traditional ThunderSVM. This integration effectively eliminates the need for manual feature extraction and significantly enhances handwritten digit recognition performance. The powerful feature learning ability of ResNet18 compensates for the shortcomings of ThunderSVM in complex feature extraction, representing the core innovation and key to the model’s performance improvement.
Experimental results show that ResNet18-ThunderSVM outperforms traditional ThunderSVM in terms of training efficiency, inference speed, parameter count, and key performance metrics (accuracy, recall, F1 score, and precision). The hybrid model demonstrates faster training convergence, lower final training loss, stronger generalization capabilities, and a more stable training process. Comparative experiments reveal that ResNet18-ThunderSVM excels across multiple performance metrics, particularly in handling high-dimensional data and complex patterns.
These improvements not only enhance model performance but also increase robustness and applicability in practical applications. The successful implementation of ResNet18-ThunderSVM in handwritten digit recognition validates the potential of combining deep learning with traditional machine learning methods, providing an efficient and reliable solution for feature extraction and classification. Additionally, the lightweight potential of ResNet18-ThunderSVM (e.g., ResNet18-Lite) makes it suitable for deployment in edge computing and mobile devices.
Looking ahead, further exploration of advanced model architectures (e.g., Transformer-based models) and optimization techniques (e.g., transfer learning, federated learning, and multimodal data fusion) will enhance the model’s generalization ability and efficiency. Through continuous optimization and expansion of its application scenarios, ResNet18-ThunderSVM is expected to become a vital tool in advancing artificial intelligence technologies, laying the foundation for more complex and efficient machine learning solutions.
Acknowledgment
We thank the funders of this project Qingdao Huanghai University, Hangzhou Jusha Overseas Technology Co., Ltd., King’s College London, Leshan Normal University and all the teams and individuals who supported this research.
Author contributions
Conceptualization, C.Z., T.Y. and Wenyi C.; methodology, C.Z., T.Y.; software, C.Z., T.Y., Z.W.; validation, C.Z., T.Y., Z.W.; formal analysis, Wenyi C. and Wenxin C.; investigation, C.Z., T.Y. and Wenxin.C.; resources, C.Z., T.Y. and Wenyi C.; data curation, C.Z., T.Y.; writing—original draft preparation, C.Z., T.Y., Wenyi C.; writing—review and editing, Z.W., Wenyi C. and Wenxin C.; visualization, C.Z., T.Y.; supervision, Wenyi C. and Wenxin C.; project administration, C.Z. All authors have read and agreed to the published version of the manuscript.
Funding
The authors gratefully acknowledge the Supported by the 2024 Undergraduate Teaching Reform Research Project, titled “Interdisciplinary Personalized Learning Path Planning Practice Based on Multimodal Knowledge Graphs”, grant number M2024297; the 2024 Shandong Provincial Higher Education Governance Special Project, titled “Innovation Research on AI-Empowered ‘Task-Driven-Theory-Practice-Combat’ Blended Teaching Model”, grant number SDGX2024049; King’s College London and the China Scholarship Council via the K-CSC Scholarship, grant number 202208060097; King’s College London and the China Scholarship Council via the K-CSC Scholarship, grant number 202508130024.
Data availability
The data supporting the findings of this study are openly available in the MNIST Dataset repository at “ https://www.kaggle.com/datasets/hojjatk/mnist-dataset ”. All experiments in this work were conducted using this publicly accessible dataset. No new data were created or analyzed in this study.
Declarations
Conflicts of interest
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Wenyi Cao contributed equally to this work.
Contributor Information
Ziyou Wang, Email: ziyou.wang@kal.ac.uk.
Wenyi Cao, Email: wenyi.cao@kcl.ac.uk.
References
- 1.Salimi, H. & Giveki, D. Farsi/Arabic handwritten digit recognition based on ensemble of SVD classifiers and reliable multi-phase PSO combination rule. Int. J. Doc. Anal. Recognit.16(4), 371–386. 10.1007/s10032-012-0195-7 (2013). [Google Scholar]
- 2.Wen, Y., Ke, W. & Sheng, H. Improved localization and recognition of handwritten digits on MNIST dataset with ConvGRU. Appl. Sci.-Basel15(1), 238. 10.3390/app15010238 (2025). [Google Scholar]
- 3.Alhamad, H. A. et al. Handwritten recognition techniques: A comprehensive review. Symmetry-Basel16(6), 681. 10.3390/sym16060681 (2024). [Google Scholar]
- 4.Abdullah, A. S., Geetha, S., Aziz, A. B. A. & Mishra, U. Design of automated model for inspecting and evaluating handwritten answer scripts: A pedagogical approach with NLP and deep learning. Alex. Eng. J.108, 764–788. 10.1016/j.aej.2024.08.067 (2024). [Google Scholar]
- 5.Qiao, J., Wang, G., Li, W. & Chen, M. An adaptive deep Q-learning strategy for handwritten digit recognition. Neural Network.107, 61–71. 10.1016/j.neunet.2018.02.010 (2018). [DOI] [PubMed] [Google Scholar]
- 6.Chen, L. et al. Review of image classification algorithms based on convolutional neural networks. Remote Sens.13(22), 4712. 10.3390/rs13224712 (2021). [Google Scholar]
- 7.Abdulraheem, A. & Jung, I. Y. Effective digital technology enabling automatic recognition of special-type marking of expiry dates. Sustainability15(17), 12915. 10.3390/su151712915 (2023). [Google Scholar]
- 8.Agrawal, V., Jagtap, J. & Kantipudi, M. V. V. P. Exploration of advancements in handwritten document recognition techniques. Intell. Syst. Appl.22, 200358. 10.1016/j.iswa.2024.200358 (2024). [Google Scholar]
- 9.Ahlawat, S., Choudhary, A., Nayyar, A., Singh, S. & Yoon, B. Improved handwritten digit recognition using convolutional neural networks (CNN). Sensors20(12), 3344. 10.3390/s20123344 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Gu, J. et al. Recent advances in convolutional neural networks. Pattern Recognit.77, 354–377. 10.1016/j.patcog.2017.10.013 (2018). [Google Scholar]
- 11.Khanam, R., Hussain, M., Hill, R. & Allen, P. A comprehensive review of convolutional neural networks for defect detection in industrial applications. IEEE Access12, 94250–94295. 10.1109/ACCESS.2024.3425166 (2024). [Google Scholar]
- 12.Larasati, R. & KeungLam, H. Handwritten digits recognition using ensemble neural networks and ensemble decision tree. In 2017 International Conference on Smart Cities, Automation & Intelligent Computing Systems (ICON-SONICS) 99-104 10.1109/ICON-SONICS.2017.8267829 (Yogyakarta, Indonesia, 2017).
- 13.Gao, P. & Adnan, M. Overview of emerging electronics technologies for artificial intelligence: A review. Mater. Today Electron.11, 100136. 10.1016/j.mtelec.2025.100136 (2025). [Google Scholar]
- 14.Dwivedi, Y. K. et al. Opinion paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inform. Manag.71, 102642. 10.1016/j.ijinfomgt.2023.102642 (2023). [Google Scholar]
- 15.Javed, A. R. et al. Artificial Intelligence for cognitive health assessment: State-of-the-Art, open challenges and future directions. Cognit. Comput.15(6), 1767–1812. 10.1007/s12559-023-10153-4 (2023). [Google Scholar]
- 16.Thamaraiselvan, S., Venugopal, V. & Vekkot, S. Interpretable handwritten digit classification: Analyzing feature extraction and explainable AI on the extended MNIST dataset. In 2024 13th international conference on system modeling & advancement in research trends (SMART) 591-597 10.1109/SMART63812.2024.10882523 (Moradabad, India, 2024).
- 17.Shrivastava, A., Jaggi, I., Gupta, S. & Gupta, D. Handwritten digit recognition using machine learning: A review. In 2019 2nd international conference on power energy, environment and intelligent control (PEEIC) 322-326 10.1109/PEEIC47157.2019.8976601 (Greater Noida, India, 2019).
- 18.Jain, P. H. et al. Artificially intelligent readers: An adaptive framework for original handwritten numerical digits recognition with OCR methods. Information14(6), 305. 10.3390/info14060305 (2023). [Google Scholar]
- 19.Liu, W., Wei, J. & Meng, Q. Comparisions on KNN, SVM, BP and the CNN for handwritten digit recognition. In 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA) 587-590 10.1109/AEECA49918.2020.9213482 (Dalian, China, 2020).
- 20.Ghosh, S. et al. SVM and KNN Based CNN architectures for plant classification. Comput. Mater. Cont.10.32604/cmc.2022.023414 (2022). [Google Scholar]
- 21.Gong, W. et al. A novel deep learning method for intelligent fault diagnosis of rotating machinery based on improved CNN-SVM and multichannel data fusion. Sensor19(7), 1693. 10.3390/s19071693 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Shen, L. et al. The challenges of recognizing offline handwritten Chinese: A technical review. Appl. Sci.-Basel13(6), 3500. 10.3390/app13063500 (2023). [Google Scholar]
- 23.Nguyen, G. et al. Machine learning and deep learning frameworks and libraries for large-scale data mining: A survey. Artif. Intell. Rev.52(1), 77–124. 10.1007/s10462-018-09679-z (2019). [Google Scholar]
- 24.Taye, M. M. Understanding of machine learning with deep learning: Architectures, workflow, applications and future directions. Computer12(5), 91. 10.3390/computers12050091 (2023). [Google Scholar]
- 25.Fateh, A., Fateh, M. & Abolghasemi, V. Multilingual handwritten numeral recognition using a robust deep network joint with transfer learning. Inform. Sci.581, 479–494. 10.1016/j.ins.2021.09.051 (2021). [Google Scholar]
- 26.Hsu, T.C., Tsai, Y.-H. & Chu, W. C.C. Enhancing digit recognition for luminous images in edge computing through transfer learning with robustness and fault tolerance. In IEEE Transactions on Reliability 1-9 10.1109/TR.2024.3393424 (2024).
- 27.Thompson, N.C., Greenewald, K., Lee, K. & Manso, G.F. The computational limits of deep learning. In ninth computing within Limits LIMITS 10.21428/bf6fb269.1f033948 (2023).
- 28.Fink, O. et al. Potential, challenges and future directions for deep learning in prognostics and health management applications. Eng. Appl. Artif. Intell.92, 103678. 10.1016/j.engappai.2020.103678 (2020). [Google Scholar]
- 29.Sarker, I. H. Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput. Sci.2(6), 420. 10.1007/s42979-021-00815-1 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Wen, Z., Shi, J., Li, Q., He, B. & Chen, J. ThunderSVM: A Fast SVM Library on GPUs and CPUs. J. Mach. Learn. Res.19(1), 1–5. 10.5555/3291125.3291146 (2018). [Google Scholar]
- 31.Paoletti, M. E., Haut, J. M., Tao, X., Miguel, J. P. & Plaza, A. A new GPU implementation of support vector machines for fast hyperspectral image classification. Remote Sens.12(8), 1257. 10.3390/rs12081257 (2020). [Google Scholar]
- 32.Raschka, S., Patterson, J. & Nolet, C. Machine learning in python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence. Information11(4), 193. 10.3390/info11040193 (2020). [Google Scholar]
- 33.Glasmachers, T. Recipe for fast large-scale SVM training: Polishing, parallelism, and more RAM!. In Artificial Intelligence and Machine Learning - 34th Joint Benelux conference, BNAIC/Benelearn 2022, revised selected papers 31–46 10.1007/978-3-031-39144-6_3 (Mechelen, Belgium, 2023).
- 34.Dong, J.-X., Krzyzak, A. & Suen, C. Y. Fast SVM training algorithm with decomposition on very large data sets. IEEE Transact. Pattern Anal. Mach. Intell.27(4), 603–618. 10.1109/TPAMI.2005.77 (2005). [DOI] [PubMed] [Google Scholar]
- 35.Liou, J.Y., Forrest, S. & Wu, C.J. Genetic Improvement of GPU code. In 2019 IEEE/acm international workshop on genetic improvement (GI), 20-27 10.1109/GI.2019.00014 (Montreal, QC, Canada, 2019).
- 36.Chen, Y., Wen, Z., He, B. & Chen, J. Efficient decomposition selection for multi-class classification. IEEE Transact. Knowl. Data Eng.35(4), 3751–3764. 10.1109/TKDE.2021.3130239 (2023). [Google Scholar]
- 37.Sun, Y., Li, P., Xu, H. & Wang, R. Structural prior-driven feature extraction with gradient-momentum combined optimization for convolutional neural network image classification. Neural Netw.179, 106511. 10.1016/j.neunet.2024.106511 (2024). [DOI] [PubMed] [Google Scholar]
- 38.Zhou, T. et al. COVID-ResNet: COVID-19 recognition based on improved attention ResNet. Electronics12(6), 1413. 10.3390/electronics12061413 (2023). [Google Scholar]
- 39.Francis, S. B. & Verma, J. P. Deep CNN ResNet-18 based model with attention and transfer learning for Alzheimer’s disease detection. Front. Neuroinform.18, 1507217. 10.3389/fninf.2024.1507217 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Du, K.-L., Jiang, B., Lu, J., Hua, J. & Swamy, M. N. S. Exploring kernel machines and support vector machines: Principles, techniques, and future directions. Mathematics12(24), 3935. 10.3390/math12243935 (2024). [Google Scholar]
- 41.Al-Khater, W. & Al-Madeed, S. Using 3D-VGG-16 and 3D-Resnet-18 deep learning models and FABEMD techniques in the detection of malware. Alex. Eng. J.89, 39–52. 10.1016/j.aej.2023.12.061 (2024). [Google Scholar]
- 42.Gao, M. et al. A novel deep convolutional neural network based on ResNet-18 and transfer learning for detection of wood knot defects. J. Sensor.2021(1), 4428964. 10.1155/2021/4428964 (2021). [Google Scholar]
- 43.Basak, M., Kim, D.-W., Han, M.-M. & Shin, G.-Y. Attention-based malware detection model by visualizing latent features through dynamic residual kernel network. Sensors24(24), 7953. 10.3390/s24247953 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.AlKendi, W., Gechter, F., Heyberger, L. & Guyeux, C. Advancements and challenges in handwritten text recognition: A comprehensive survey. J. Imag.10(1), 18. 10.3390/jimaging10010018 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Mei, Y. ResNet18 facial feature extraction algorithm improved based on hybrid domain attention mechanism. PLoS One20(3), e0319921. 10.1371/journal.pone.0319921 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Liu, W., Cui, N., Guo, L., Du, S. & Wang, W. DESformer: A dual-branch encoding strategy for semantic segmentation of very-high-resolution remote sensing images based on feature interaction and multiscale context fusion. IEEE Transact. Geosci. Remote Sens.62, 1–20. 10.1109/TGRS.2024.3446628 (2024). [Google Scholar]
- 47.Muksimova, S. et al. Rl-cervix. Net: A hybrid lightweight model integrating reinforcement learning for cervical cell classification. Diagnostics15(3), 364 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Díaz-Vico, D., Alonso, J. P., Omari, A. & Dorronsoro, J. R. Deep support vector neural networks. Integr. Comput. Aided Eng.27(4), 389–402. 10.3233/ICA-200635 (2020). [Google Scholar]
- 49.Xu, W., Fu, Y.-L. & Zhu, D. ResNet and its application to medical image processing: Research progress and challenges. Comput. Method. Program. Biomed.240, 107660. 10.1016/j.cmpb.2023.107660 (2023). [DOI] [PubMed] [Google Scholar]
- 50.Mumuni, A. & Mumuni, F. Data augmentation: A comprehensive survey of modern approaches. Array16, 100258. 10.1016/j.array.2022.100258 (2022). [Google Scholar]
- 51.Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE86(11), 2278–2324. 10.1109/5.726791 (1998). [Google Scholar]
- 52.Ciregan, D., Meier, U. & Schmidhuber, J. Multi-column deep neural networks for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence 3642-3649 10.1109/CVPR.2012.6248110 (RI, USA, 2012).
- 53.Deng, L. & Yu, D. Deep learning: Methods and applications. Now10.1561/2000000039 (2014). [Google Scholar]
- 54.Tang, W. & Fei, L. Handwritten digit recognition based on SVM. In 2nd international conference on frontiers of intelligent manufacturing and automation, CFIMA 2024, 324 331 10.1145/3704558.3705521 (New York, NY, USA, 2025).
- 55.Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. Acm.60(6), 84–90. 10.1145/3065386 (2017). [Google Scholar]
- 56.Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015, 10.48550/arXiv.1409.1556 (San Diego, CA, United states, 2015).
- 57.Yilmaz, A. A., Guzel, M. S., Bostanci, E. & Askerzade, I. A novel action recognition framework based on deep-learning and genetic algorithms. IEEE Access8, 100631–100644. 10.1109/ACCESS.2020.2997962 (2020). [Google Scholar]
- 58.Mutlu, G. & Acı, Ç. İ. SVM-SMO-SGD: A hybrid-parallel support vector machine algorithm using sequential minimal optimization with stochastic gradient descent. Parall. Comput.113, 102955. 10.1016/j.parco.2022.102955 (2022). [Google Scholar]
- 59.Ajala, S., Jalajamony, H. M., Nair, M., Marimuthu, P. & Fernandez, R. E. Comparing machine learning and deep learning regression frameworks for accurate prediction of dielectrophoretic force. Sci. Rep.12, 11971. 10.1038/s41598-022-16114-5 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Sadrfaridpour, E. Fast machine learning algorithms for massive datasets with applications in the biomedical domain: Dissertation, Clemson University, (South Carolina, USA, 2020).
- 61.Ganjei, M. A. & Boostani, R. A hybrid feature selection scheme for high-dimensional data. Eng. Appl. Artif. Intell.113, 104894. 10.1016/j.engappai.2022.104894 (2022). [Google Scholar]
- 62.Zaki, A., Métwalli, A., Aly, M. H. & Badawi, W. K. Enhanced feature selection method based on regularization and kernel trick for 5G applications and beyond. Alex. Eng. J.61(12), 11589–11600. 10.1016/j.aej.2022.05.024 (2022). [Google Scholar]
- 63.Craighero, F. et al. Unity is strength: Improving the detection of adversarial examples with ensemble approaches. Neurocomputing554, 126576. 10.1016/j.neucom.2023.126576 (2023). [Google Scholar]
- 64.Jiang, J., Wen, Z., Wang, Z., He, B. & Chen, J. Parallel and distributed structured SVM training. IEEE Transact. Parall. Distrib. Syst.33(5), 1084–1096. 10.1109/TPDS.2021.3101155 (2022). [Google Scholar]
- 65.Nawal, Y., Oussalah, M., Fergani, B. & Fleury, A. New incremental SVM algorithms for human activity recognition in smart homes. J. Ambient. Intell. Human. Comput.14, 13433–13450. 10.1007/s12652-022-03798-w (2023). [Google Scholar]
- 66.Zhang, R. et al. Automatic sleep staging method using EEG based on STFT and residual network. IEEE Access13, 1778–1789. 10.1109/ACCESS.2024.3524267 (2025). [Google Scholar]
- 67.Huan, Y. & Yan, W. Semaphore recognition using deep learning. Electronics14(2), 286. 10.3390/electronics14020286 (2025). [Google Scholar]
- 68.Muhtar, Y., Muhammat, M., Yadikar, N., Aysa, A. & Ubul, K. FC-ResNet: A multilingual handwritten signature verification model using an improved ResNet with CBAM. Appl. Sci. Basel.13(14), 8022. 10.3390/app13148022 (2023). [Google Scholar]
- 69.Liou, J.-Y., Wang, X., Forrest, S. & Wu, C.-J. GEVO: GPU code optimization using evolutionary computation. ACM Transact. Archit. Code Optim.17(4), 1–28. 10.1145/3418055 (2020). [Google Scholar]
- 70.Ni, Y. et al. Hyperdimensional brain-inspired learning for phoneme recognition with large-scale inferior colliculus neural activities. IEEE Transact. Biomed. Eng.71(11), 3098–3110. 10.1109/TBME.2024.3408279 (2024). [DOI] [PubMed] [Google Scholar]
- 71.Xiang, Q., Wang, X., Lei, L. et al. Dynamic bound adaptive gradient methods with belief in observed gradients. Pattern. Recognit. 111819 (2025).
- 72.Xiang, Q. et al. Quadruplet depth-wise separable fusion convolution neural network for ballistic target recognition with limited samples. Expert Syst. Appl.235, 121182 (2024). [Google Scholar]
- 73.Xiang, Q. et al. ISONet: Reforming 1DCNN for aero-engine system inter-shaft bearing fault diagnosis via input spatial over-parameterization. Expert. Syst. Appl.277, 127248 (2025). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data supporting the findings of this study are openly available in the MNIST Dataset repository at “ https://www.kaggle.com/datasets/hojjatk/mnist-dataset ”. All experiments in this work were conducted using this publicly accessible dataset. No new data were created or analyzed in this study.





















