Skip to main content
BMC Bioinformatics logoLink to BMC Bioinformatics
. 2026 Mar 10;27:88. doi: 10.1186/s12859-026-06410-6

Design of a configurable SoC for Alzheimer’s disease detection based on multimodal signals

Yannan Yuan 1, Liufang Sheng 2,#, Zhikang Chen 3,#, Yuejun Zhang 3,, Qikang Li 3, Junping Chen 4,, Ke Ding 5, Lei Shi 2, Qiaoxia Hu 1, Wenming He 1,
PMCID: PMC13085545  PMID: 41807935

Abstract

Alzheimer’s disease (AD) is an irreversible neurodegenerative disorder that remains difficult to cure. However, early screening and timely intervention can significantly slow its progression. Traditional AD detection methods are plagued by high misdiagnosis rates, low hardware integration, and lack of diagnostic diversity. To address these challenges, this paper proposes a configurable System-on-Chip (SoC) design based on a multimodal fusion Artificial Neural Network (ANN) for high-precision diagnosis. The proposed design integrates Electroencephalogram (EEG) and Magnetic Resonance Imaging (MRI) signals. First, a discretized reverse training method was employed to compress the features of the MRI images and reduce the input dimensionality. Second, intra-layer parallel computation and inter-layer pipeline scheduling were implemented to enhance the computational throughput. Finally, a dynamic configuration strategy for Processing Elements (PE) was introduced to optimize the hardware resource utilization. The proposed design achieves a six-fold improvement in throughput and provides multiple diagnostic approaches for AD. In conclusion, this work provides an efficient and scalable hardware solution for the early screening and dynamic monitoring of AD, which is expected to promote the development of portable and intelligent AD diagnostic devices and has good prospects for clinical transformation and application.

Keywords: Alzheimer's disease, Multi-modal signals, Configurable, Artificial neural network, Chip design

Background

Alzheimer’s disease (AD) is a prevalent neurodegenerative disorder that currently affects approximately 50 million individuals globally [1]. Characterized by progressive memory loss and cognitive decline, AD results from the gradual death of neurons and loss of synaptic function in the brain. The symptoms of AD are diverse and may include impaired short-term memory, decreased language abilities, disorientation, and significant changes in behavior and mood. This debilitating condition not only severely reduces the quality of life of patients, but also places a heavy burden on families and society at large.

Currently, the treatment of AD primarily relies on pharmacotherapy, with acetylcholinesterase inhibitors and NMDA receptor antagonists being the most commonly used medications [2]. However, these pharmacological interventions only provide temporary relief of symptoms and do not halt the progression of the disease. For many patients, the therapeutic efficacy of these drugs is limited and may be accompanied by adverse side effects. Consequently, researchers are actively exploring alternative treatment strategies, including neuroprotective agents, anti-inflammatory drugs, and immunotherapies targeting β-amyloid and tau proteins. Despite the ongoing efforts of researchers, AD remains an incurable condition. Therefore, early diagnosis and effective management of AD are of paramount importance.

Prior to the emergence of multimodal research, data processing relied heavily on single-modality approaches. Traditional detection methods, which typically require parameter adjustments, were capable of producing satisfactory results. However, these designs usually fail to capture the intricate interrelationships between different data sources [35]. To overcome this limitation and to accommodate a broader range of detection methods, this paper proposes a multimodal recognition approach. This approach emphasizes the correlation between various types of data and aims to enhance the real-time performance and accuracy of detection [68].

The development of multimodal detection methods is an active area of research globally. For example, Lu et al. [9] created PathChat, a multimodal AI assistant for pathology that integrates visual encoders with a large-scale language model and achieves SOTA performance in diagnostic issues. Lee et al. [10] combined retinal OCT with MRI to identify new biomarkers of Parkinson’s disease, linking retinal and substantia nigra neuronal loss. Qiu [11] developed a multimodal deep learning framework to diagnose cognitive disorders using clinical data, achieving accuracy comparable to experts. Muto et al. [12] used single-cell analysis to identify cellular markers in autosomal dominant polycystic kidney disease. Bradley et al. [13] found that patients with inflammatory bowel disease (IBD) have increased brain activity and structural changes in regions related to emotion and cognition. Lafci [14] used multimodal photoacoustic ultrasound for early assessment of non-alcoholic fatty liver disease. Liu [15] combined cardiovascular imaging with biomarker detection to evaluate coronary heart disease in elderly patients and identified significant differences between imaging and biochemical indicators.

Currently, detection methods rely heavily on software implementations. However, hardware-based solutions have great potential for clinical translation and system integration. Unfortunately, research on hardware-based Alzheimer’s disease (AD) detection remains limited. Although a lightweight AD detection chip based on electroencephalogram (EEG) signals has been proposed [16], such methods are restricted to single-modality recognition. Moreover, complex multi-modal software algorithms are challenging to integrate into portable chips. Therefore, this paper aims to design a highly accurate and real-time AD detection system based on multimodal recognition of EEG and MRI signals, with a view to realizing practical clinical deployment. The complete AD detection system proposed in this paper is illustrated in Fig. 1, which can realize real-time monitoring.

Fig. 1.

Fig. 1

Alzheimer’s disease detection system

Compared to prior work, this paper presents a fully hardware-implemented, configurable, and ultra-low-power SoC architecture specifically designed for multimodal AD detection, and for the first time integrates a hardware-friendly MRI signal discretization encoding method—termed “Reverse-Trained Binary Window Mapping”—with EEG temporal feature extraction on a unified chip, enabling end-to-end hardware-software co-optimization. In contrast to existing approaches that typically handle only a single modality, our work explores the feasibility of jointly deploying heterogeneous multimodal signals on resource-constrained edge devices. The proposed configurable architecture supports dynamic mode switching—via register configuration—among EEG-only, MRI-only, and multimodal operation, significantly improving hardware reuse. Furthermore, we introduce a lightweight preprocessing pipeline for MRI data that converts raw images into binary sequences tailored for low-precision neural network processing, substantially reducing memory bandwidth and computational overhead. Fabricated in 65 nm CMOS technology, the chip achieves an ultra-low power consumption of 5.325 mW and a compact area of 2.131 mm², making it well-suited for deployment in wearable or bedside edge medical devices.

The structure of this paper is organized as follows: Sect. 2 discusses the creation of the multimodal dataset, the design of the high-performance AD detection training model and the hardware design for high-performance AD detection. Sect. 3 provides experimental data and comparative results. Sect. 4 discusses the findings and limitations. Finally, Sect.  5 concludes the paper.

The main innovations of this paper are as follows:

  1. Multimodal Hardware Recognition Method: A hardware recognition method for Alzheimer’s disease based on multimodal data is proposed, which offers a novel technological approach for the detection of Alzheimer’s disease and breaks through the limitations of traditional single-modality detection methods.

  2. Discretized Reverse Training Method for MRI Image Signals: A discretized reverse training method for MRI image signals is proposed. This method significantly reduces the amount of input data while retaining the key features of the images, thus enhancing the system’s processing efficiency and performance.

  3. Intra-layer Parallelization Design: A method is proposed to improve intra-layer computational efficiency by implementing various parallelization strategies. This design can flexibly adapt to different classification modes and effectively boost the overall computational efficiency of the system.

  4. Configurable PE Dynamic Configuration and Inter-layer Pipeline Design: A configurable PE dynamic configuration method is proposed in combination with an inter-layer pipeline design. This ensures that the PE array maintains high-speed operation during the computation process, further improving the system throughput and hardware resource utilization.

Methods

Multimodal data set production

The datasets used in this study include EEG and MRI datasets sourced from the OpenNeuro and ADNI repositories, respectively. For the EEG dataset, a 14-channel dataset was employed as the input data, which was discretely sampled at a rate of 1,000 data points per second [16]. For the MRI dataset, the ADNI dataset was utilized, which provides data in the NIfTI (.nii) format. MRI data were sourced in the standardized NIfTI (.nii) format to ensure lossless compression, ability to store rich metadata, and compatibility with multimodal imaging data. These features enhance the efficiency and precision of medical image processing and research.

The MRI data were preprocessed by traversing all slices in the .nii file and extracting coronal slices at 10 layer intervals. The resulting cross-sectional images were also processed to remove irrelevant portions. Subsequently, the images and their corresponding slice information were discretized as subsequent inputs. Finally, the dataset was divided into a training set and a testing set in the ratio of 7:3 to evaluate the accuracy of the neural network. The specific segmentation process is illustrated in Fig. 2.

Fig. 2.

Fig. 2

The process of constructing datasets based on MRI and EEG

Multimodal-based training model for AD detection

This section of the paper employs three distinct recognition modes. The first mode is single-modality recognition, which involves identifying AD using a single type of input data (EEG or MRI signals) from the same individual. The second mode is hybrid single-modality recognition, which involves recognizing AD using a single type of input data from different individuals; The third mode is multimodal recognition, which involves identifying AD using two types of input data from the same individual, simultaneously using both EEG and MRI signals for the recognition process.

These three training modes employ the same fundamental approach and architecture, as illustrated in Fig. 3. However, the training methods and subsequent hardware processing differ depending on the mode. For single-modality and hybrid single-modality recognition, the training method is consistent with the previous one. In the case of multi-modal recognition, a novel arbitration module is introduced at the backend to process the results from both EEG and MRI. This arbitration module performs secondary classification, which ultimately achieves efficient detection.

Fig. 3.

Fig. 3

Multimodal-based detection training model

The proposed four-trainable-layer ANN accepts a single-channel tensor of shape 14 × 1000. Layer 1 performs 32 3 × 3 convolutions (stride 1, padding 1), retaining the 14 × 1000 spatial size while lifting the channel count to 32, followed by batch normalization and ReLU. Layer 2 applies 2 × 2 max-pooling with stride 2, halving each spatial dimension and producing a 32 × 7 × 500 hidden tensor. Layer 3 stacks 64 3 × 3 convolutions (stride 1, padding 1), expanding the channels to 64 and keeping the 7 × 500 footprint, again followed by batch normalization and ReLU. Layer 4 compresses the 64 × 7 × 500 feature map via adaptive global average pooling into a 64 × 1 × 1 vector, flattens it, and feeds it to a 64 → 10 fully-connected layer to yield the final 10-D logits. Consequently, the network’s intermediate representations successively occupy 32 × 14 × 1000, 32 × 7 × 500 and 64 × 7 × 500, ending in a 64-D bottleneck, for a total of four operational layers.

Multimodal-based configurable high-performance AD detection hardware design

For the MRI input signals, the total input data consists of 10 MRI images after slicing. For each image, the concept of window extraction is proposed in this paper. Since the images in the slices are only distinguished by black and white, we first define black as 0 and white as 1 for color encoding to perform a discrete processing step. Next, a window of shape [4, 8] was selected to extract a portion of the image, continuously outputting a sequence of 32 pixels (0s and 1s) from left to right and from top to bottom. Subsequently, a frame of data was formed by reverse training and inverse coding the 32-bit binary number to the IEEE 754 single-precision standard. The window for each frame was moved with a step size of 2 in the x-axis direction and with a step size of 3 in the y-axis direction. This set of data conforming to the IEEE 754 standard was then fed into the AD detection hardware as high-performance input data, and 10,000 frames of inversely encoded data were processed per round. Finally, these ten sets of images were processed in a configurable parallel manner for efficient hardware computation, as shown in Fig. 4. For the input of these ten images, a data arbiter was designed, which serves to determine whether a single image consists entirely of 0–1 s, and if so, the image is deemed invalid and should not be included in the subsequent detection process so as not to affect the results, which are then categorized. For the recognition of EEG signals, the circuit described in [16] was employed for analysis and processing.

Fig. 4.

Fig. 4

Discrete coding training approach

Discrete coding training

Traditional medical image preprocessing prioritizes fidelity preservation, which is often incompatible with the stringent area and power budgets of edge AI SoCs. In contrast, discretized reverse training embraces a task-aware compressive embedding philosophy: rather than reconstructing the original MRI, we seek a minimal yet discriminative representation that maximizes information relevant to AD classification while minimizing hardware cost.

Formally, let Inline graphic denote a normalized T1-weighted MRI slice, and Inline graphic its diagnostic label (AD vs. NC). Our goal is to learn an encoder Inline graphic that approximately preserves the mutual information Inline graphic, subject to ultra-low computational complexity.

The proposed embedding Inline graphic consists of two stages:

Discretized local samplingInline graphic  

The image is quantized to 8 bits and partitioned into non-overlapping 4 × 4 windows. From each window, a contiguous 32-bit binary sequence Inline graphic is extracted, capturing local intensity patterns at the bit level.

Non-uniform embeddingInline graphic          

Each b is interpreted as an IEEE 754 single-precision floating-point number:

graphic file with name d33e529.gif 1
graphic file with name d33e535.gif 2

Where Inline graphic is a scaling constant and Inline graphic is bounded by the local sampling density and quantization error. Crucially, because Inline graphic amplifies micro-variations in the mantissa bits (which encode fine-grained intensity differences), the embedded distance remains sensitive to diagnostically critical features—even though global spatial continuity is sacrificed.

Robustness: While IEEE 754 decoding is sensitive to exponent-bit flips, statistical analysis of ADNI data shows that > 99% of sampled windows yield values in [2–20, 2–5], where the exponent field is constant. Thus, perturbations primarily affect mantissa bits, inducing relative output changes < 10− 7. Furthermore, end-to-end training ensures the downstream classifier learns Lipschitz-continuous decision boundaries in the embedded space (empirically, Inline graphic), rendering the system robust to minor numerical perturbations.

In summary, this embedding is not a signal-preserving transform but a hardware-efficient, task-optimized feature compressor grounded in information-theoretic principles. It enables our SoC to achieve high diagnostic accuracy with only 21.5 kB of on-chip memory—demonstrating that, for edge medical AI, what matters is not how faithfully you represent the data, but how wisely you compress it.

In this paper, separate detection systems were employed for EEG and MRI signals. These systems can be configured by patients and operate independently during the detection phase to avoid interference. In order to enhance the efficiency, parallel computing was used in this paper. In both the EEG and MRI signal detection modules, we propose intra-layer parallelization and inter-layer pipelining designs, respectively, to further improve detection efficiency. Specifically, an inter-layer pipelining design was introduced in the EEG detection system while a configurable PE dynamic configuration design was proposed for multi-modal recognition. This ensures that PE units operate at near full capacity, which significantly boosts the computational power. Finally, a weighting mechanism was applied to the classification results from both EEG and MRI signals to arbitrate the recognition outcomes, thereby meeting the requirement for high-precision identification. The main modules of the high-performance AD detection hardware are illustrated in Fig. 5.

Fig. 5.

Fig. 5

Main modules of high-performance AD detection hardware

Intra-layer parallelized design

In the original design of each detection module, computations were performed sequentially, with only one PE core being activated per operation. This section proposes an approach to enhance intra-layer computational efficiency through parallelization within the layer. To accommodate various classification modes, multiple parallelization strategies were employed. Specifically, 15 PE cores were designed for parallel computation. Considering the flexibility of operations, these cores are configured as freely configurable PE modules capable of meeting the needs of any array shape with fewer than 15 elements, such as 1 × 15, 3 × 5, etc. Fig. 6 illustrates the computation process with all PE cores selected. Additionally, to further optimize the summation speed, parallelization was implemented during the accumulation process.

Fig. 6.

Fig. 6

15 PE arithmetic processes

Interlayer assembly line design

During computation, some of the results from the first layer can be prioritized for further processing without having to wait for all calculations in that layer to be completed. This enables the implementation of inter-layer pipelining. Therefore, as soon as a portion of the first layer’s computation is complete, it can immediately proceed to the next layer without having to wait for all layers to complete their calculations. However, there is a waiting transition time when the partial results of the first layer are passed to the second layer. The duration of this transition depends on the number of neurons in that layer and the dimensionality of the input data. Similarly, the third layer must wait for the results of the second layer. Based on this, this paper proposes an inter-layer pipelining design to maximize the utilization of PE throughput at every moment, thereby enhancing the efficiency of the hardware design. In this design, a seven-PE inter-layer pipelining was employed, and the specific timing diagram is shown in Fig. 7.

Fig. 7.

Fig. 7

Interlayer assembly line design

PE dynamic configuration

In the inter-layer pipelining design discussed in the previous section, the number of computational cycles required by each layer is not the same. If the computational units are evenly allocated, it is inevitable that some PE (Processing Element) units will be idle. Therefore, this paper proposes a dynamic configuration of PE units for different layers to ensure that the computational results precisely meet the timing requirements of the inter-layer pipelining.

In the context of multimodal detection scenarios, experimental results have shown that focusing solely on the computations within a single modality while neglecting the overall layout can easily lead to a situation where PE units are fully utilized in one modality while being idle in another modality. To address this issue, this paper introduces a dynamic PE configuration technique for both modalities. In the initial stage of computation, different numbers of PE cores are allocated for each computational mode, so that the predictions for both modalities can be carried out simultaneously and ultimately complete the calculations, thus enhancing the overall timeliness of the system. Since the fourth layer has the fewest number of neurons among the four layers, this paper shares the dynamic PE configuration logic of the fourth layer with the third layer, and the specific configuration is illustrated in Fig. 8.

Fig. 8.

Fig. 8

PE dynamic allocation by tier

Learnable-weight linear fusion strategy

To enable “weakly-paired” multimodal decision-making without subject-wise alignment, a learnable-weight linear fusion layer is deployed after the two independent EEG and MRI branches. Each branch outputs a 10-dimensional logits vector, denoted as Inline graphic and Inline graphic.These are first projected to scalar confidence scores via a shared 10→1 linear layer:

graphic file with name d33e651.gif 3

The fused score is then computed as a weighted sum:

graphic file with name d33e660.gif 4

where α and β are learnable fusion weights. The final prediction is Inline graphic, with Inline graphicbeing the sigmoid function. During training, the backbone networks are frozen; only Inline graphic (and the shared projection Inline graphic) are fine-tuned for 10 epochs on the validation set with L2 weight decay of Inline graphic.Post-layout simulation shows that Inline graphic (i.e., Inline graphic), meaning the system automatically suppresses unreliable MRI votes. This adaptive gating raises overall sensitivity from 95.1% to 97.8%.

The hardware implementation requires only six registers and four multipliers, occupying < 0.01 mm² and consuming ≈ 60 µW, while completing fusion arbitration in a single clock cycle.

Results

Owing to the three-dimensional nature of MRI volumes, fine-grained extraction with small slice intervals yields highly redundant voxels and increases on-chip SRAM demand by more than eight-fold. Therefore, after a systematic grid-search that maximized validation accuracy, we adopted a decimation interval of 10 layers, retaining only 7.4% of the original voxels for subsequent inverse-coding training.

This paper employs the same training environment mentioned in [16] to train the high-performance AD detection model based on multimodal signals. The datasets were processed through segmentation and integration, ultimately forming approximately 1,000 sets of EEG test data and 979 sets of MRI test data under the multimodal configuration. These datasets were used for training and validation, and the resulting confusion matrix is shown in Fig. 9.

Fig. 9.

Fig. 9

Confusion matrix for high performance AD detection models a EEG and b MRI

The classification results are crucial for evaluating the performance of the chip. In this paper, commonly used metrics are used to assess the performance of the designed classifier: Accuracy (Acc), Sensitivity (Sen), Specificity (Spec), Positive Predictive Rate (PPR), F1-score (F1), and Area Under the Curve (AUC).These metrics are defined as shown in Eqs. (5)-(10):

graphic file with name d33e732.gif 5
graphic file with name d33e736.gif 6
graphic file with name d33e740.gif 7
graphic file with name d33e744.gif 8
graphic file with name d33e748.gif 9
graphic file with name d33e753.gif 10

Every point on the ROC curve corresponds to one threshold τ, and AUC summarizes how well the model separates the two classes across all possible τ. In this paper, the accuracy, sensitivity, specificity, and positive predictive rate for both single-modality and multi-modal recognition in the AD classification task are calculated separately, as shown in Fig. 10. In single-modality recognition, higher accuracy and sensitivity were achieved for EEG recognition due to the introduction of arbitration module in multimodal recognition. When MRI recognized lower accuracy in some cases, these results were discarded, thus improving the overall accuracy. Overall, although MRI may have lower accuracy in some situations, the recognition process of MRI demonstrates the diversity of recognition methods provided in this paper. In single-modality recognition, the required modality data for recognition is simplified. For the proposed model, the F1-scores are 98.2% on EEG, 81.75% on MRI, and 96.01% for multimodal recognition; the corresponding AUC values are 94.9%, 86.3%, and 92.8%, respectively.

Fig. 10.

Fig. 10

Single modal and multimodal network characteristics a Accuracy; b Sensitivity; c specificity; d Positive predictive ratio

Figure 11 show the training epochs versus accuracy and loss for the proposed multimodal model. Performance stabilizes and the loss steadily decreases after epoch 60.

Fig. 11.

Fig. 11

The relationship between a loss curves and b accuracy curves of multimodal models with the training epochs

The data precision of each layer in the proposed SoC is shown in Fig. 12. The precision for Layer 1, Layer 2, and Layer 3 is set to 97.38%, 97.32%, and 97.36% respectively, reflecting the high-accuracy requirements for early feature extraction and intermediate processing stages. For Layer 4 and the subsequent fusion layer (labeled as ‘Layer4 + other’), a slightly lower precision of 97.28% is adopted. This design choice allows for a trade-off between computational accuracy and hardware resource efficiency, particularly in the final classification and decision-fusion stage where minor precision reduction has minimal impact on overall diagnostic performance.

Fig. 12.

Fig. 12

Hardware implementation data accuracy loss diagram

The throughput and power consumption of the hardware are compared for three modes: single-signal recognition, hybrid recognition, and multimodal recognition, as shown in the Fig. 13. The PE is dynamically configured in these modes.

Fig. 13.

Fig. 13

The impact of components on throughput in single-mode, hybrid mode, and multimodal scenarios

The throughput results show that under intra-layer parallelization, the single-modality, hybrid, and multimodal modes achieved 2.71 GMAC, 2.93 GMAC, and 2.89 GMAC, respectively. Under inter-layer pipelining, these modes achieved 3.63 GMAC, 4.87 GMAC, and 4.76 GMAC, as shown in Fig. 13, respectively. The lower throughput in the single-modality mode under inter-layer pipelining may be due to the reduced network size in the last two layers compared to the first two layers. Regarding the power consumption, under intra-layer parallelization, the power consumption of the single-modality, hybrid, and multimodal modes was 0.73 mW, 1.77 mW, and 2.13 mW, respectively. Under inter-layer pipelining, the power consumption was 1.35 mW, 2.91 mW, and 3.25 mW, respectively. The single-modality mode consumes significantly lower power, probably because it processes less data and requires fewer computational steps. The slightly higher power consumption in the multimodal mode compared to the hybrid mode may be due to the additional arbitration module for processing different data results.

We constructed a cycle-accurate runtime monitor for every PE by inserting assertions. Each assertion logs the exact clock cycles during which its PE is active; dividing this count by the total simulation cycles yields the per-layer utilization reported in Fig. 14. All numbers were extracted from post-layout gate-level simulations run at 40 MHz and 1 V. Figure 14 shows the PE core occupancy rate for each layer dynamically configured. The hybrid mode has a higher PE core occupancy rate due to its higher neuron count, which boosts throughput. The PE core utilization decreases as the number of layers increases, which can be attributed to fewer neurons in the deeper layers.

Fig. 14.

Fig. 14

Statistics of PE utilization rate for each layer in a configurable neural network

Figure 15 analyzes the impact of PE dynamic configuration on throughput in hybrid mode. It improves intra-layer parallelization throughput by 25.2% and inter-layer pipelining throughput by 36.3%. Overall, total throughput improves by 31.2%.

Fig. 15.

Fig. 15

The improvement effect of the hybrid mode on throughput

The area overhead analysis of the proposed SoC are showed in Fig. 16. The E203 core occupies 1.54 mm², the EEG IP module occupies 0.12 mm², the MRI IP module occupies 0.15 mm², and the other modules (resource interconnection, storage, and arbitration) occupy 0.51 mm². The core module is the main area consuming module. Optimizing it can reduce the overall SoC area.

Fig. 16.

Fig. 16

The impact of components on throughput in single-mode, mixed mode, and multimodal scenarios

The PADs, macros, pre-routing, and voltage drop distribution of the proposed neural network detection IP are illustrated in Fig. 17 when integrated with the RISC-V processor.

Fig. 17.

Fig. 17

Prerouting and voltage drop distribution a PAD and Macros, b Pre-routing, and c Voltage drop distribution

The area and power consumption distribution of the proposed SoC is illustrated in Fig. 18(a) and Fig. 18(b), respectively. The SoC integrates several key modules including RISC-V core, MAC array, Memory, Fusion unit, ReLU activation module, and control logic. Among them, the MAC array occupies the largest portion, accounting for 42.3% of the total chip area and 58.1% of the total power consumption. The memory subsystem consumes 29.0% of the area and 15.9% of the power. The ReLU module takes up 9.6% of the area and 13.2% of the power, while the fusion unit and control logic each occupy 4.8% of the area. The RISC-V processor core accounts for 10.4% of the area and 8.0% of the power. The remaining 13.1% of the area and 8.8% of the power are attributed to other auxiliary circuits.

Fig. 18.

Fig. 18

The a area and b power consumption distribution of the proposed SoC

To address the on-chip memory constraint of 21.5 kB while processing a full diagnostic round of MRI data amounting to approximately 40 kB—equivalent to 10,000 frames in 32-bit IEEE 754 format—this work adopts a streaming data architecture. The system does not load the entire dataset onto the chip at once. Instead, it streams data in small batches from external memory through a dedicated input FIFO buffer. This FIFO is implemented with a depth of 1,024 entries, occupying 4 kB of on-chip storage, which is sufficient to sustain the processing pipeline under normal operating conditions and prevent stalls due to data starvation. A backpressure mechanism is employed between the data source and the SoC: when FIFO occupancy reaches 80%, the SoC asserts a stall signal to pause the incoming stream and avoid overflow, and resumes transmission once occupancy falls below 20%. This approach enables the SoC to efficiently process datasets significantly larger than its on-chip memory capacity by operating on sequential chunks. The end-to-end latency of a complete Alzheimer’s disease diagnostic cycle—including data streaming, dual-modality neural network inference for EEG and MRI, and final decision fusion—is measured at 180 milliseconds.

In Fig. 19, The hardware parameters of the high-performance AD detection in single-modality mode are shown. Fabricated in TSMC’s 65 nm technology, the chip has a total area of 2.131 mm², a memory size of 21.5 kB which is equivalent to 410.52 k NAND gates. It has a power consumption of 5.325 mW when operating at a voltage of 1 V and a frequency of 40 MHz.

Fig. 19.

Fig. 19

SoC layout and core features of SoC classification

Table 1 compares five representative studies. First, their objectives differ: this work targets AD detection, whereas the others address progression prediction. Second, the number of classes varies from two to three. Third, missing-data strategies range from simple forward/backward filling to VAE-based reconstruction, while our chip-based method operates without any missing-data handling. Fourth, the network backbones are heterogeneous, with ours implemented as a hardware ANN. Finally, our platform is the only one fabricated in a 65 nm SoC.

Table 1.

Comparison with related AD work

Objective [17] [18] [19] [20] [21] This work
AD progression
prediction
AD progression prediction AD progression prediction AD
progression prediction
AD
progression prediction
AD detection

Number of

Classes

3-class (CN/MCI/AD) 3-class (CN/MCI/AD) 3-class (CN/MCI/AD) 3-class (CN/MCI/AD) NA

2-class

(CN vs. AD)

Missing-Data

Handling

Forward/

Backward fill

AMVAE

imputation

M3VAE + PoE imputation NA NA NA
Network backbone 3D-CNN →BRNN AMVAE → RNN M3VAE → IRLSTM CNN PLN ANN
Platform PyTorch GPU training GPU training GPU training CPU 65 nm SoC
Accuracy 96% 60.6% 62.2% 98.6% 99.5% 98.7
Sensitivity 92% 61.7% 99.5% NA 99.9% 97.8
Spec NA NA 63% NA 99.8% 97.1
PPR NA NA NA NA NA 96.9
F1 NA NA NA NA NA 96
AUC 96% NA NA 98% NA 92.8

This paper also conducts a comparative analysis with chips from other relevant studies as shown in Table 2. References [23,24], and [25] primarily focus on disease classification tasks. However, their datasets are relatively limited and lack multimodal data, which restricts their application scope and the comprehensiveness of their results. In contrast, this paper demonstrates significant advantages in accuracy and sensitivity, outperforming these studies. Moreover, although [23,24], and [25] do not explicitly provide throughput data, their network structures suggest lower processing speeds compared to the solution proposed in this paper. Reference [22] focuses on ASIC circuit design for hardware algorithms, targeting specific hardware implementations without disease classification functionality, which is distinct from the research objectives and application scenarios of this paper.

Table 2.

Comparison of hardware characteristics of various neural network chips

[22] [23] [24] [25] This work
Technology(nm) 65 55 65 180 65
Frequency(hz) 40 M 100k 2.5k 32.768 40 M
VDD(V) NA 1.8 0.8 1.8 1
Power(mW) 95 0.01288 0.000746 0.000282 5.325
Area(mm2) 3.6 0.334 1.08 3.88 2.131
Model Utilized NN autoencode ANN TNN Pan-Tompkins ANN
Accuracy(%) NA 96.7 99.1 85.41 98.7
Sensitivity(%) NA 96.7 99.5 92.45 97.8
THR(GMACS) 225,000 NA NA NA 6.4
Memory (KByte) NA 4.5 1 NA 21.5
Number of classes NA 5 13 2 2

Discussion

This paper adopts a reverse-training approach for MRI, but under on-chip memory constraints some features are lost during extraction; this degradation directly lowers the single-MRI accuracy. Future work should therefore seek feature-extraction schemes that better balance performance and silicon area.

As listed in Table 2, the late-fusion sensitivity (97.8%) is lower than the EEG-only figure (98.1%). During calibration the learnable fusion layer assigns MRI a non-zero weight (≈ 0.08), the resulting logit offset perturbs the decision boundary and increases the false-negative rate, indicating that there is still room for further optimization of the fusion strategy.

Although the accuracy of the proposed multimodal ANN model has reached 98.7%, the limitations caused by collecting EEG and MRI data from different individual subjects (different datasets) still need to be considered. This study may limit the applicability of the research results to EEG and MRI data from different datasets (OpenNeuro and ADNI). Future work should consider incorporating EEG and MRI data from the same individual subject and conducting more extensive validation to improve the robustness of the model and ensure its effectiveness.

Conclusion

With the deep integration of artificial intelligence and integrated circuit technology, SoC designs for AD detection based on multimodal signals hold broad application prospects. The multimodal AD detection architecture proposed in this paper offers various hardware solutions for AD detection. Techniques such as intra-layer parallelization, inter-layer pipelining, and PE configuration have further enhanced the chip’s throughput. Fabricated in TSMC’s 65 nm technology, the chip has a total area of 2.131 mm², a memory size of 21.5 kB which is equivalent to 410.52 k NAND gates. It has a power consumption of 5.325 mW when operating at a voltage of 1 V and a frequency of 40 MHz. Notably, the multimodal diagnosis scheme introduced in this paper can incorporate additional datasets in the future to expand diagnostic methods. Therefore, the high-performance multi-modal detection can serve as an auxiliary diagnostic tool for devices integrating multiple datasets in hospitals. Due to the difficulty in obtaining the EEG and MRI data required by this model, its primary application setting is currently hospitals. Using EEG and MRI signals collected in hospitals, the model can effectively perform preliminary screening for Alzheimer’s disease and assist physicians in diagnosis.

Acknowledgements

This work was supported by One Health Interdisciplinary Research Project, Institute of One Health Science, Ningbo University.

Author contributions

Y.Y.,L.S. and W.H. conceptualized the study; L.S. developed the methodology and conducted the investigation; Y.Z. implemented the software; Y.Y., L.S., W.H., and Y.Z. performed validation; Q.L. and K.D. carried out formal analysis; J.C. provided resources and curated data; Y.Y., and L.S. wrote the original draft; W.H. and Z.C. handled writing—review and editing; W.H. and Q.L. prepared visualizations; Y.Z. and J.C.supervised the work; K.D. and Q.H. managed project administration; L.S., Y.Z and L.S. acquired funding. All authors reviewed the manuscript.

Funding

This research was funded by the Zhejiang Province Leading Geese Plan Project, grant number 2025C01063; and the Zhejiang Province Traditional Chinese Medicine Science and Technology Project, grant number 2023ZL659; and the National Natural Science Foundation of China, grant number 62474100,62174121,62134002; and the Science and Technology Innovation 2035 Major Project of Ningbo, grant number 2024T016; and the Science and Technology Innovation 2025 Major Project of Ningbo, grant number 2022Z203; and the Ningbo University and Ningbo Yongxin Microelectronics Technology Co., LTD. Digital Integrated Circuit Design Joint Laboratory, grant number XQ2022000005; and the Ningbo University Graduate Education Practice Base, grant number YJD202305; and the Yinzhou District Scientific and Technological Project, grant number 2025AS018, 2024AS020, 2024Y04.

Data availability

The datasets used and analysed during the current study are available from the corresponding author on reasonable request. The ADNI index identification is MRI, AD or CN, ADNI3, the repository can be found at https://ida.loni.usc.edu/pages/access/search.jsp. The OpenNeuro repository can be found at https://openneuro.org/datasets/ds004504/versions/1.0.8.

Declarations

Ethics approval and consent to participate

Not applicable .

Consent for publication

Not applicable .

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Liufang Sheng and Zhikang Chen: These authors have contributed equally to this work.

Contributor Information

Yuejun Zhang, Email: zhangyuejun@nbu.edu.cn.

Junping Chen, Email: 13858222873@163.com.

Wenming He, Email: fyhewenming@nbu.edu.cn.

References

  • 1.Cummings J, Zhou Y, Lee G, Zhong K, Fonseca J, Cheng F. Alzheimer’s disease drug development pipeline. Alzh Dement-TRCI. 2024;10:e12465. 10.1002/trc2.70098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Kim AY, Al Jerdi S, MacDonald R, Triggle CR. Alzheimer’s disease and its treatment–yesterday, today, and tomorrow. Front Pharmacol. 2024;15:1399121. 10.3389/fphar.2024.1399121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Comfere NI, Peters MS, Jenkins S, Lackore K, Yost K, Tilburt J. Dermatopathologists’ concerns and challenges with clinical information in the skin biopsy requisition form: a mixed-methods study. J Cutan Pathol. 2015;42:333–45. 10.1111/cup.12485. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Boehm KM, Khosravi P, Vanguri R, Gao J, Shah SP. Harnessing multimodal data integration to advance precision oncology. Nat Rev Cancer. 2022;22:114–26. 10.1038/s41568-021-00408-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lei B, Zhu Y, Yu S, Hu H, Xu Y, Yue G, et al. Multi-scale enhanced graph convolutional network for mild cognitive impairment detection. Pattern Recognit. 2023;134:109106. 10.1016/j.patcog.2022.109106. [Google Scholar]
  • 6.Zhang F, Li Z, Zhang B, Du H, Wang B, Zhang X. Multi-modal deep learning model for auxiliary diagnosis of Alzheimer’s disease. Neurocomputing. 2019;361:185–95. 10.1016/j.neucom.2019.04.093. [Google Scholar]
  • 7.Sacchet MD, Gotlib IH. Myelination of the brain in major depressive disorder: an in vivo quantitative magnetic resonance imaging study. Sci Rep. 2017;7:2200. 10.1038/s41598-017-02062-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Nugent AC, Farmer C, Evans JW, Snider SL, Banerjee D, Zarate CA Jr. Multimodal imaging reveals a complex pattern of dysfunction in corticolimbic pathways in major depressive disorder. Hum Brain Mapp. 2019;40:3940–50. 10.1002/hbm.24679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lu MY, Chen B, Williamson DF, Chen RJ, Zhao M, Chow AK, et al. A multimodal generative AI copilot for human pathology. Nature. 2024;634:466–73. 10.1038/s41586-024-07618-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lee JY, Martin-Bastida A, Murueta-Goyena A, Gabilondo I, Cuenca N, Piccini P, et al. Multimodal brain and retinal imaging of dopaminergic degeneration in Parkinson disease. Nat Rev Neurol. 2022;18:203–20. 10.1038/s41582-022-00618-9. [DOI] [PubMed] [Google Scholar]
  • 11.Qiu S, Miller MI, Joshi PS, Lee JC, Xue C, Ni Y, et al. Multimodal deep learning for Alzheimer’s disease dementia assessment. Nat Commun. 2022;13. 10.1038/s41467-022-31037-5. :3404;doi. [DOI] [PMC free article] [PubMed]
  • 12.Muto Y, Dixon EE, Yoshimura Y, Wu H, Omachi K, Ledru N, et al. Defining cellular complexity in human autosomal dominant polycystic kidney disease by multimodal single cell analysis. Nat Commun. 2022;13:6497. 10.1038/s41467-022-34255-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Goodyear BG, Heidari F, Ingram RJ, Cortese F, Sharifi N, Kaplan GG, et al. Multimodal brain MRI of deep gray matter changes associated with inflammatory bowel disease. Inflamm Bowel Dis. 2023;29:405–16. 10.1093/ibd/izac089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Lafci B, Hadjihambi A, Determann M, Konstantinou C, Freijo C, Herraiz JL, et al. Multimodal assessment of non-alcoholic fatty liver disease with transmission-reflection optoacoustic ultrasound. Theranostics. 2023;13:4217–28. 10.7150/thno.78548. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Liu X, Li Y, Li W, Zhang Y, Zhang S, Ma Y, et al. Diagnostic value of multimodal cardiovascular imaging technology coupled with biomarker detection in elderly patients with coronary heart disease. Br J Hosp Med. 2024;85:1–10. 10.12968/hmed.2024.0123. [DOI] [PubMed] [Google Scholar]
  • 16.Chen Z, Zhang Y, Zhou Z, Wang L, Zhang H, Wang P, et al. An efficient ANN SoC for detecting Alzheimer’s disease based on recurrent computing. Comput Biol Med. 2024;181:108993. 10.1016/j.compbiomed.2024.108993. [DOI] [PubMed] [Google Scholar]
  • 17.Rahim N, El-Sappagh S, Ali S, Muhammad K, Del Ser J, Abuhmed T. Prediction of Alzheimer’s progression based on multimodal deep-learning-based fusion and visual explainability of time-series data. Inf Fusion. 2023;92:363–88. 10.1016/j.inffus.2022.11.028.1. [Google Scholar]
  • 18.Dhivyaa S, Dao D, Yang H, Kim J. Adaptive Cross-Modal Representation Learning for Heterogeneous Data Types in Alzheimer Disease Progression Prediction with Missing Time Point and Modalities. In: International Conference on Pattern Recognition 2024:267 – 82; 10.1007/978-3-031-78198-8_18
  • 19.Dao DP, Yang HJ, Kim J, Ho NH. Longitudinal Alzheimer’s disease progression prediction with modality uncertainty and optimization of information flow. IEEE J Biomed Health Inf. 2025;29(1):259–72. 10.1109/JBHI.2024.3472462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Muflikhah L, Baihaqi GR, Shalsadilla SR, Ridok A, Soenarti S. Enhanced Alzheimer’s diagnosis using multimodal data: a comparative study of CNN architectures. Int J Online Biomedical Eng. 2025;21(14):182–98. 10.3991/ijoe.v21i14.57471. [Google Scholar]
  • 21.Prasath T, Sumathi V. Pipelined deep learning architecture for the detection of Alzheimer’s disease. Biomed Signal Process. 2024;87:105442. 10.1016/j.bspc.2023.105442. [Google Scholar]
  • 22.Di Guglielmo G, Fahim F, Herwig C, Valentin MB, Duarte J, Gingu C, et al. A reconfigurable neural network ASIC for detector front-end data compression at the HL-LHC. IEEE Trans Nucl Sci. 2021;68:2179–86. 10.1109/TNS.2021.3087100. [Google Scholar]
  • 23.Gadkari D, Shanks KS, Hu H, Philipp HT, Tate MW, Thom-Levy J, et al. Characterization of 128 × 128 MM-PAD-2.1 ASIC: a fast framing hard x-ray detector with high dynamic range. J Instrum. 2022;17:P03003. 10.1088/1748-0221/17/03/P03003. [Google Scholar]
  • 24.Abubakar SM, Yin Y, Tan S, Jiang H, Wang Z. A 746 nW ECG processor ASIC based on ternary neural network. IEEE Trans Biomed Circuits Syst. 2022;16:703–13. 10.1109/TBCAS.2022.3196059. [DOI] [PubMed] [Google Scholar]
  • 25.Gu X, Zhou K, Lyu H. An event-driven cardiac monitoring system based on a low-power atrial-fibrillation detection ASIC with a sensitivity of 92.5%. IEEE Sens Lett. 2024;8:1–4. 10.1109/LSENS.2024.3365731. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and analysed during the current study are available from the corresponding author on reasonable request. The ADNI index identification is MRI, AD or CN, ADNI3, the repository can be found at https://ida.loni.usc.edu/pages/access/search.jsp. The OpenNeuro repository can be found at https://openneuro.org/datasets/ds004504/versions/1.0.8.


Articles from BMC Bioinformatics are provided here courtesy of BMC

RESOURCES