Abstract
Neuromorphic computing (NC) architecture inspired by biological nervous systems has been actively studied to overcome the limitations of conventional von Neumann architectures. In this work, we propose a reconfigurable NC block using a flash-type synapse array, emerging positive feedback (PF) neuron devices, and CMOS peripheral circuits, and integrate them on the same substrate to experimentally demonstrate the operations of the proposed NC block. Conductance modulation in the flash memory enables the NC block to be easily calibrated for output signals. In addition, the proposed NC block uses a reduced number of devices for analog-to-digital conversions due to the super-steep switching characteristics of the PF neuron device, substantially reducing the area overhead of NC block. Our NC block shows high energy efficiency (37.9 TOPS/W) with high accuracy for CIFAR-10 image classification (91.80%), outperforming prior works. This work shows the high engineering potential of integrating synapses and neurons in terms of system efficiency and high performance.
The reconfigurable neuromorphic computing block is implemented by integrating a synaptic array and a neuron device.
INTRODUCTION
Deep neural networks (DNNs) have shown over-human performance in various fields, such as image classification or natural language processing (1–6). In particular, convolutional neural networks (CNNs) mimicking biological vision systems have become fundamental techniques for high performance in vision fields (7–9). However, the increasing area and power overhead of von Neumann computing architectures, mainly induced by the data communication between processing units and memory, have also become very critical issues in DNNs (10–14). In this regard, interest in neuromorphic computing (NC) architectures, which are inspired by biological nervous systems, has been rapidly increasing (15–18). NC architectures using analog nonvolatile memory cells reduce data communication by computing in the memory domain in an analog manner, enhancing the system efficiency (Fig. 1A). Moreover, according to Kirchhoff’s and Ohm’s laws, NC architectures perform massively parallel operations with high energy efficiency and reduced area overhead (19–23).
Fig. 1. Proposed NC block.
(A) Schematic diagram of neuromorphic computing using electronic synaptic devices. The post-synaptic neurons process the signals from the synapses. (B) Proposal of the reconfigurable NC block with flash-type synapse arrays, PF neuron devices, and CMOS peripheral circuits.
However, conventional NC architectures using emerging memory cells have faced several challenges at the device and circuit levels. The emerging memory cells used for synapse devices, such as ReRAM (resistive random-access memory) (24–28) or PCRAM (phase-change RAM) (29), have suffered from reliability issues at both the unit device and array levels, including a limited on/off ratio (30–32), device-to-device variation (17, 33), and sneak path problem (34–37). High power consumption and a large area of analog-to-digital converters (ADCs) induce ADCs to be shared in NC blocks, lowering system efficiency (38, 39). In addition, to perform reliable and accurate vector-matrix multiplication (VMM) operations, precise calibration of ADCs is required, which leads to additional modules or external controllers for the calibration. Therefore, it is necessary to develop a novel NC architecture that can efficiently address these limitations while exhibiting high performance.
In this work, we propose a reconfigurable NC block based on a flash-type synapse array and positive feedback (PF) neuron device (Fig. 1B) and implement the blocks by integrating not only the synapse array and neurons but also complementary metal-oxide semiconductor (CMOS) peripheral circuits on the same substrate. The operations of proposed NC blocks were experimentally demonstrated in the fabricated blocks. Flash-type synapse arrays with technically mature high reliability can store multi-bit weights and perform VMM accurately. In addition, because of the super-steep switching characteristics of PF neuron devices (40), the proposed NC block substantially reduces the number of devices to process the synapse signals, addressing the ADC sharing problem very efficiently. Moreover, the proposed NC block can generate a reconfigurable output code by modulating the conductance of flash synapse cells for reference signals, which enables easy calibration as well as the implementation of various activation functions. The successful demonstration of the implemented NC block in this work proves that the cointegration of synapse arrays and neurons mimicking highly efficient biological nervous systems also has promising potential from an engineering perspective, including system efficiency and accuracy.
RESULTS
Device characterization
Figure 2 (A and B) shows the schematic cross section of the flash device and the top view of the PF device, respectively, and Fig. 2 (C to E) shows the schematic cross sections of PF devices along the cut lines. Both negative MOS (nMOS) and positive MOS (pMOS) devices were also fabricated on the same wafer by modulating the doping concentration of channels. Because of the SiO2 (3 nm)/Si3N4 (6 nm)/SiO2 (9 nm) gate stack, the flash device exhibits a nonvolatile memory function with high reliability. The PF device consists of two gates (gates 1 and 2) on the floating p−-bodies, which modulate the barrier height for electron and hole injection. In addition, the PF device consists of a pnpn diode between the cathode and the anode, forming a barrier for electrons and holes. Figure 2F shows the schematic diagram of the integrated NC block, which consists of a 16 × 2 AND-type flash array, two current-to-voltage (I-to-V) converters, and a PF readout (PFRO) neuron circuit. The blue column in the flash array represents synaptic flash cells (SYN cells), and the red column represents reference flash cells (REF cells). The SYN cells are used to store the weights, and the REF cells are used to sense the results of dot-product operations from SYN cells. The flash cells are configured in an AND-type array architecture, in which source lines and drain lines are parallel. The AND-type array has the advantages of low-power selective write operations, scalability, and large-scale parallel computing (41–43). On the other hand, selective write operations in a NOR-type array consume a lot of energy, and it is difficult to perform large-scale parallel operations in a NAND-type array due to a cell string structure. From this perspective, we used the AND-type flash array for the implementation of the NC block with other circuits. The integrated NC block is shown in Fig. 2G (full micrograph image of the NC block in fig. S1). The detailed operation scheme of the proposed NC block is represented in the next section.
Fig. 2. Implemented NC block.
Schematic cross sections of (A) flash device and (B) top view of the PF device. Schematic cut-views of the PF device along the cutline (C) (a-a′), (D) (b-b′), and (E) (c-c′). (F) Schematic circuit diagram of the fabricated column in the NC block, including a 16 × 2 AND-type flash array, I-to-V converters, and PFRO circuit. (G) Micrographs of the AND-type flash array, I-to-V converters, and PFRO circuit. They are integrated on the same wafer.
Figure S2 (A and B) shows the ID-VGS curves of the nMOS and pMOS devices, and the ID-VDS curves of the pMOS devices, which are fabricated on the same substrate (see Methods for the fabrication process). Figure 3 shows the device characteristics of the fabricated PF devices and flash devices. Figure 3 (A and B) shows the IA-VG1 curves of the PF device as a parameter of VG2 at VD of 1.5 and 0 V, respectively. As a neuron device, the PF device exhibits super-steep switching characteristics due to the PF operation [subthreshold swing (SS) < 1 mV/dec]. The detailed mechanism of the PF operation is as follows. Initially, the p-body under G1 forms an electron injection barrier between the cathode and the n-body, and the n-body forms a hole injection barrier between the anode and the p-body. The positive voltage applied to gate 1 (VG1), which is on the floating p-body, reduces the barrier height for electrons, increasing the number of injected electrons from the cathode to the n-body of the PF device. This process reduces the barrier height for the holes, which also increases the number of injected holes from the anode to the p-body under G1 of the PF device. This result further lowers the barrier height for the electron transfer to the n-body in the form of PF. These recursive processes result in the steep switching of PF devices. Moreover, the important characteristics of the PF devices are that the turn-on voltage of the PF device can be modulated by VG2 and VD. At VD of 1.5 V, the positive VG2 inhibits the PF operation by discharging the electrons in the n-body (Fig. 3A), as previously reported (40). In contrast, at VD of 0 V, VG2 boosts the PF operation by charging the electrons in the n-body (Fig. 3B). The electric potential between the cathode and drain of PF device is shown in Fig. 3C under different VD conditions. Under the turn-on conditions of PF devices, the electric potential is nearly flattened, while in the turn-off conditions, the barriers for electrons and holes are formed.
Fig. 3. Device and array characterization.
(A and B) IA-VG1 curves of the PF devices as a parameter of VG2 at VD of 1.5 and 0 V, respectively. Note that the PF device shows super-steep switching characteristics; thus, it acts as a comparator in a single device (SS, <1 mV/dec). In contrast to the curves at VD of 1.5 V, the turn-on voltage of the PF device decreases as VG2 increases at VD of 0 V. (C) Electric potential of the PF device between the cathode and drain at different VDs. (D) Distribution of seven weight levels in the flash array after weight tuning. Upper and lower values represent the mean and SD of each weight level, respectively. The maximum σ/μ is ~4.1%. (E) Measured current sum (ISYN) in the SYN flash array versus ideal ISYN. Ideal ISYN is obtained with the sum of ISYN in each device. Because of the excellent saturation current characteristics of the flash devices, the measured data show high linearity (R2 = 0.9995), leading to accurate VMM operations in the array.
The I-V characteristics of the flash devices are shown in fig. S3. Because of the charge trap layer, the conductance of the flash devices can be modulated by program and erase (PGM/ERS) operations, enabling the conductance of the device to be used as a weight in the NC block. The ID-VDS curves as a parameter of VGS are shown in fig. S3B. Because the flash devices exhibit the saturated current with respect to VDS, a constant current level is maintained even with a voltage change at the drain. These characteristics can reduce the peripheral circuit to maintain the drain voltage. The transfer curves of 32 cells in the array in the initial state are shown in fig. S3C. We use the conductance tuning scheme to map the trained weights to the cells (41). The result of the seven-level weight tuning with a small error (σ/μ < 5%) is shown in Fig. 3D. This figure indicates that the conductance of flash devices can be tuned to the target weight value that can be trained by off-chip training. In other words, the NC block can achieve near state-of-the-art accuracy with the highly trained weights in the software. Figure 3E shows the measured ISYN of flash cells in the NC block versus the calculated sum of each ISYN of flash cells. We measured the current of each flash cell in the column of the array with different VGs and calculated the sum of the currents in the software (the calculated ISYN sum). Subsequently, we measured the current sum of the flash cells by simultaneously applying the VGs to the flash cells (the measured ISYN sum) and compared the current sums. The programming error in the closed-loop conductance tuning scheme was not reflected in this measurement. Note that the current (ISYN) from the flash cells flows through the left pMOS device in the I-to-V converter in Fig. 2D. Thus, the drain voltage of the flash cells in the array can be changed by the magnitude of ISYN. Nevertheless, the high linearity of the VMM data is obtained because of the saturation current in the reliable flash cells. This means that the VMM operations are accurately performed in the flash array without op-amps that require high-power consumption and large area, which is also a great advantage of the proposed NC block.
Operations of the proposed NC block
The PFRO neuron circuit design was previously introduced for spiking neural networks in our work (40). The design was conducted using a technology computer-aided design (TCAD) simulation tool. In this work, we implemented the PFRO neuron circuit through the integration of the PF device with CMOS peripheral circuits on the same substrate and used it as a comparator in the NC blocks. The PFRO neuron circuit operates as follows. When an input voltage signal (VG1) converted from ISYN of the flash array is applied to G1 and reaches Von to turn on the PF device, the input voltage of the inverter in the PFRO circuit is pulled down. Then, the output voltage of the inverter (Vout) is pulled up and turns on the nMOS in the PFRO circuit, further decreasing the input voltage of the inverter and increasing Vout. The high Vout also turns on the nMOS in the CMOS peripheral circuit of the PFRO block, eliminating the excess holes and electrons in the floating bodies and resetting the PF device. Then, the PF device is prepared to receive the next VG1 signal. In addition, the VG2 signal converted from IREF modulates Von of PF device.
Figure 4 (A and B) shows Vout versus VWL, the gate voltage applied to one SYN cell, at VD of 1.5 and 0 V in the fabricated NC block. Vout of the NC block steeply increases when ISYN reaches the sensing value, indicating that the fabricated NC block can generate an output signal depending on the VMM operations. When ISYN flows through the left pMOS of the converter, the copied current flows through the right pMOS and nMOS in the converter. Then, VG1 applied to the PF devices increases as ISYN increases, and the PF device turns on at the sensing ISYN, as shown in Fig. 4 (A and B). In addition, as the number of turn-on REF cells in which VREF bias is applied to their word line (WL) increases, IREF and the resulting VG2 increase. Thus, Von is modulated by the number of turn-on REF cells, and the sensing ISYN at Von is modulated accordingly. The direction in Von shift by the number of turn-on REF cells is determined by VD. As shown in Fig. 4B, Von is shifted to the left as the number of turn-on REF cells increases at VD of 0 V. Vout responses at different VREFs are shown in fig. S4. The pulse responses of the PFRO circuit are represented in Fig. 4C, indicating that the implemented neuron circuit can generate output pulses fast enough (fig. S5 for measurement setup). Figure 4D shows the sensing ISYN versus the number of turn-on REF cells as the parameters of VD and VREF. The sensing ISYN is gradually modulated by the number of turn-on REF cells and the amplitude of VD. Figure 4E shows the sensing ISYN versus the number of turn-on REF cells with different VDD, Cons. Because the increased VDD, Con widens the range of VG2 resulting from IREF, the number of escaping electrons in n-body by VG2 can also be increased at VD = 1.5 V. Thus, the PF operation can be further suppressed by the increased VG2, and the range of the sensing ISYN is widened.
Fig. 4. Operations of implemented NC block.
(A) Vout versus VWL of the SYN cell at VREF = 2.5 V as a parameter of the number of turn-on REF cells at VD = 1.5 V. The Von increases as the number of turn-on REF cells increases. (B) Vout versus VWL of the SYN cell at VREF = 2.0 V as a parameter of the number of turn-on REF cells at VD = 0 V. The Von decreases as the number of turn-on REF cells increases. Red symbols represent ISYN in the SYN cell with respect to VWL. (C) Output pulse responses of the implemented PFRO neuron circuit. The PFRO neuron circuit successfully generates output signals very quickly. (D and E) ISYN at Von versus the number of turn-on REF cells as a parameter of VREF and VDD, Con, respectively. If VD changes from 1.5 to 0 V, ISYN at Von decreases with the increasing number of REF cells turned on.
Reconfigurable calibration of the NC block
Figure 5 shows the operation scheme of the NC block. First, 1-bit input voltage signals are applied to WLs of the SYN cells, and then ISYN flows as a dot-product result. In a 4-bit precision of outputs, the PFRO should sense the 16 different ISYN levels, and 16 cycles (Φ1 ~ Φ16) are required to sense 16 different ISYN levels (sensing ISYN level: I1 ~ I16). Note that each sensing ISYN level is determined by IREF and VD. Thus, the number of turn-on REF cells in each cycle is set differently, and VD is switched from 0 to 1.5 V when the direction in Von shift by VG2 has to be changed. Here, for example, we set the switching time of VD to the time just before Φ9 (ninth cycle). Then, to increase Von of the PF device sequentially, the number of REF cells is decreased from eight to one during Φ1-Φ8 and increased from one to eight during Φ9-Φ16. ISYN is sensed in each cycle where Von of the PF device is sequentially changed, and the PFRO circuit in each column also sequentially produces an output signal based on the sensing results. The bit precision of the input signals can be increased by redundant computing blocks or multiple time intervals with different bit significance, as shown in fig. S6 (44). The main advantage of the proposed NC block is that the sensing ISYN levels are reconfigurably set by modulating the conductance of the REF cells. In other words, the conductance of the REF cells can be adjusted to calibrate the sensing ISYN using the closed-loop conductance tuning method, which is used for transferring weights to the SYN cells. In addition, the operation speed of the NC blocks can be further increased in binarized neural networks, in which binary (1-bit) activation outputs are used.
Fig. 5. Example of an operation scheme of the NC block.
ISYN resulting from the weighted sum in the SYN cells is sensed with the given IREF and VD in each cycle. In this example, eight REF cells are turned on at VD = 0 V for the least significant bit (LSB) during Φ1, and eight REF cells are turned on at VD = 1.5 V for the most significant bit (MSB) during Φ16. The conductance of REF cells can be modulated to calibrate the output signals of the NC blocks.
Figure 6A shows the calibration results of NC block where the conductance of the REF cells is modulated. Although the NC block exhibits a nonlinear output response to the sensing ISYN with the offset error before the calibration, the linearity and offset error in the response are greatly improved after the calibration. Figure 6 (B and C) shows the measured differential nonlinearity (DNL) and integral nonlinearity (INL), respectively, as a parameter of the tuning tolerance to each sensing ISYN level. The NC block produces a more accurate output code with a lower tuning tolerance. The transfer curves of the REF cells after and before the calibration are shown in Fig. 6D. Conductance modulation for the calibration is easily achieved with a selective PGM/ERS operation in an AND-type flash array (41). Figure 6E shows the sigmoid-like calibration result of the NC block. The calibration was also conducted by tuning the conductance of AND-type REF cells to have the sigmoid-like output response. This result indicates that the NC block can also generate output codes depending on various activation functions due to the programmable memory functionality, thereby reducing additional modules for the functions and enhancing the system efficiency. Figure 6F shows the output response to the sensing ISYN as a parameter of the retention time. The retention and endurance characteristics of the fabricated flash cells are represented in fig. S7. Because of the nonvolatile characteristics of the charge-trap flash memory cells, the output response calibrated by the REF cells is maintained for a long time. This enables the proposed NC block to produce calibrated output signals on the chip without the help of external modules after one-time precise calibration. In addition, the proposed calibration method can be applied to other NC architectures that use emerging memory cells with nonvolatile memory characteristics. Figure 6G shows the calibration results of five blocks on the different dies. Since the characteristics of the PF devices can be affected by the process variation, device-to-device variation of the PF devices exists, particularly Vons, which is shown in fig. S8. This device variation also leads to the block-to-block variation and can cause an error in VMM results. As shown in Fig. 6G, the five blocks exhibit the block-to-block variation in output responses before the calibration. However, the five blocks produce almost the same output responses to the sensing ISYNs after the calibration, meaning that the block-to-block variation issue can be easily solved in the proposed NC blocks. In addition, this advantage makes the proposed NC block achieve high classification accuracy.
Fig. 6. Calibration results and accuracy evaluation.
(A) Output code versus ISYN at Von of the NC block before and after the calibration. Before the calibration, the output code of the NC block is nonlinear to the sensing ISYN. However, the NC block generates linear output signals after the linearity and offset calibration by modulating the conductance of REF cells. (B and C) Measured DNL and INL of the NC block after the offset and linearity calibration, respectively. (D) IREF versus VREF curves of the REF cells. The threshold voltages of REF cells are modulated to calibrate the output signals of the NC block. (E) Sigmoid-like calibration result. (F) Output code versus ISYN at Von dependent on the retention time. The NC block generates calibrated output signals for a long time. (G) Calibration results of five different NC blocks. (H) Training curve of the VGG-9 network with 3-bit weights and 4-bit activations of the hard-sigmoid function. (I) Comparison of the CIFAR-10 image classification accuracy in different calibration schemes and tolerances (tol) to the sensing ISYN.
We evaluated the accuracy of CIFAR-10 image classification with the proposed NC block. The VGG-9 network (fig. S9) was trained in the PyTorch framework with a hard-sigmoid activation function, and its training curve with 3-bit weights and 4-bit activation values is represented (Fig. 6H). After a 50-epoch training, the accuracy of the quantized neural networks was 92.15% as a baseline. We assumed that the trained weights are transferred to the conductance of the SYN cells (3-bit) with a tuning error of 5% using the conductance tuning scheme, verified in Fig. 3D. Then, we simulate the accuracy of CNN using the characteristics of NC blocks in different calibration schemes for 4-bit activation values (Fig. 6I, please see Methods for the simulation process). Before the calibration, the accuracy of the CNN was degraded compared to the offline trained networks. This is because the VMM results by the nonlinearity and offset errors in the NC blocks result in incorrect classification outputs. However, the accuracy was improved up to the baseline accuracy after the calibration. This indicates that the reconfigurable calibration through integrating the nonvolatile synapses and neurons is of great value in an accuracy improvement of the NC blocks. Figure S10 shows the breakdown of the power consumption of the NC block, and table S1 shows benchmarking results with other works. Our NC block achieves an energy efficiency of 37.9 tera-operations per second per watt (TOPS/W), including the SYN cells, REF cells, converters, and PFRO circuits. In addition, because of the super-steep switching PF device and flash memory technology, our NC block exhibits a noticeably reduced area overhead compared to other works. The excellent performance of the NC block can be further improved by reducing the device size in the improved technology nodes.
DISCUSSION
In summary, we have proposed a reconfigurable NC block with an AND-type flash synapse array, super-steep PF neuron devices, and CMOS peripheral circuits. Each component was integrated and connected on the same substrate to implement the NC block efficiently. The proposed NC block exhibits high energy efficiency (37.9 TOPS/W) with a reduced area overhead for analog-to-digital conversions because of the super-steep characteristics of the PF device and flash memory technology. In addition, the fabricated NC block reconfigurably produces output codes, enabling not only easy calibration in the NC blocks but also the implementation of various activation functions. These advantages result in the NC blocks exhibiting a high classification accuracy (91.80%) in CIFAR-10 image classification. Our work provides important technological advancement to effectively overcome the limitations of existing NC architectures and verifies the high potential of integrating synapses and neurons in terms of system efficiency.
METHODS
Fabrication of NC blocks
The key fabrication processes on a 6-inch silicon-on-insulator (SOI) wafer were as follows. The Si active regions of the nMOS, pMOS, flash, and PF devices were defined, and ion implantations were performed to dope each active region. Then, the n+ poly-Si was deposited and defined for the gates of the nMOS, pMOS, and PF devices. After ion implantation for the n-body of PF devices, SiO2 (3 nm)/Si3N4 (6 nm)/SiO2 (9 nm) was formed for gate insulating and charge-trap memory function. Then, n+ poly-Si was deposited and defined for the gate of the flash devices. To form a source/drain for the nMOS, pMOS, and flash devices, and a cathode/anode for the PF devices, ion implantation and activation process were performed. The whole process is represented in fig. S11. The measurement setup of the NC blocks is shown in fig. S12.
Network simulation
The network structure used in the simulation is the VGG-9 network for the CIFAR-10 image classification, one of the most popular CNNs. The network has six convolutional layers, three max-pooling layers, and three fully connected layers. The maximum size of the weight matrices is 8192 × 1024 in the first fully connected layer. The detailed network size is shown in fig. S6. The training algorithm with quantized weights and activations was represented previously (45). The classification accuracy is evaluated with the CIFAR-10 test image set.
After the VGG-9 network is trained, the hardware-based network is simulated using the characteristics of the NC blocks. The trained weights are first scaled to the conductance range of the flash cells and transferred to the conductance of SYN flash cells using the closed-loop conductance tuning scheme. The tuning scheme implemented in this work has a 5% tuning tolerance to the desired conductance level, which is modeled as normally distributed noise with an SD of 5% to the trained weights. Then, VMM operations are performed by the current summation of the flash cells to which input signals are applied. We set the bit precision of the inputs as that of outputs (2- or 4-bit), which can be implemented using redundant computing blocks or redundant time intervals with different bit significance, as shown in fig. S6. The resulting current summation is sequentially compared to the sensing currents of the PFRO neuron circuit, determined by the conductance of REF flash cells to generate output signals. Note that we set the sensing current of each output code in the simulation to the measured data in Fig. 6A, depending on the calibration schemes. Thus, the fluctuation in the output response of the proposed NC block is also reflected in the simulation. We assumed that the max-pooling layers are processed in the external digital domain.
Acknowledgments
Funding: This work was supported by the BK21 FOUR program of the Education and Research Program for Future ICT Pioneers, Seoul National University in 2021, and National R&D Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2021M3F3A2A02037889). Author contributions: Conceptualization: D.K., S.Y.W., K.-H.L., H.K., S.-H.P., J.-H.B., and J.-H.L. Methodology: D.K., S.Y.W., J.H., and W.S. Supervision: S.Y.W. and J.-H.L. Writing—original draft: D.K. Writing—review and editing: S.Y.W., J.H., W.S., J.-H.B., J.-J.K., and J.-H.L.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.
Supplementary Materials
This PDF file includes:
Figs. S1 to S12
Table S1
References
REFERENCES AND NOTES
- 1.Luo Y., Yu S., Accelerating deep neural network in-situ training with non-volatile and volatile memory based hybrid precision synapses. IEEE Trans. Comput. 69, 1113–1127 (2020). [Google Scholar]
- 2.Yu C., Yoo T., Kim H., Kim T. T. H., Chuan K. C. T., Kim B., A logic-compatible eDRAM compute-in-memory with embedded ADCs for processing neural networks. IEEE Trans. Circuits Syst I Regul. Pap. 68, 667–679 (2021). [Google Scholar]
- 3.Demasius K. U., Kirschen A., Parkin S., Energy-efficient memcapacitor devices for neuromorphic computing. Nat. Electron. 4, 748–756 (2021). [Google Scholar]
- 4.Wang Z., Li C., Lin P., Rao M., Nie Y., Song W., Qiu Q., Li Y., Yan P., Strachan J. P., Ge N., McDonald N., Wu Q., Hu M., Wu H., Williams R. S., Xia Q., Yang J. J., In situ training of feed-forward and recurrent convolutional memristor networks. Nat. Mach. Intell. 1, 434–442 (2019). [Google Scholar]
- 5.Wang Q., Park Y., Lu W. D., Device variation effects on neural network inference accuracy in analog in-memory computing systems. Adv. Intell. Syst. 4, 2100199 (2022). [Google Scholar]
- 6.Baltrusaitis T., Ahuja C., Morency L. P., Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019). [DOI] [PubMed] [Google Scholar]
- 7.F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. arXiv:1602.07360 [cs.CV] (24 February 2016).
- 8.Fukushima K., Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw. 1, 119–130 (1988). [Google Scholar]
- 9.Kwon D., Lim S., Bae J. H., Lee S. T., Kim H., Seo Y. T., Oh S., Kim J., Yeom K., Park B. G., Lee J. H., On-chip training spiking neural networks using approximated backpropagation with analog synaptic devices. Front. Neurosci. 14, 423 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Chakraborty I., Ali M., Ankit A., Jain S., Roy S., Sridharan S., Agrawal A., Raghunathan A., Roy K., Resistive crossbars as approximate hardware building blocks for machine learning: Opportunities and challenges. Proc. IEEE 108, 2276–2310 (2020). [Google Scholar]
- 11.Yang J. Q., Zhou Y., Han S. T., Functional applications of future data storage devices. Adv. Electron Mater. 7, 2001181 (2021). [Google Scholar]
- 12.Yang J. Q., Wang R., Ren Y., Mao J. Y., Wang Z. P., Zhou Y., Han S. T., Neuromorphic engineering: From biological to spike-based hardware nervous systems. Adv. Mater. 32, e2003610 (2020). [DOI] [PubMed] [Google Scholar]
- 13.Gao L., Ren Q., Sun J., Han S. T., Zhou Y., Memristor modeling: Challenges in theories, simulations, and device variability. J. Mater. Chem. C Mater. 9, 16859–16884 (2021). [Google Scholar]
- 14.Yu S., Neuro-inspired computing with emerging nonvolatile memorys. Proc. IEEE 106, 260–285 (2018). [Google Scholar]
- 15.Peng X., Huang S., Jiang H., Lu A., Yu S., DNN+NeuroSim V2.0: An end-to-end benchmarking framework for compute-in-memory accelerators for on-chip training. IEEE Trans. Comput-Aided. Des. Integr. Circuits Syst. 40, 2306–2319 (2021). [Google Scholar]
- 16.Akashi N., Kuniyoshi Y., Tsunegi S., Taniguchi T., Nishida M., Sakurai R., Wakao Y., Kawashima K., Nakajima K., A coupled spintronics neuromorphic approach for high-performance reservoir computing. Adv. Intell. Syst. 4, 2200123 (2022). [Google Scholar]
- 17.Chen P. Y., Peng X., Yu S., NeuroSim: A circuit-level macro model for benchmarking neuro-inspired architectures in online learning. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37, 3067–3080 (2018). [Google Scholar]
- 18.Burr G. W., Shelby R. M., Sidler S., di Nolfo C., Jang J., Boybat I., Shenoy R. S., Narayanan P., Virwani K., Giacometti E. U., Kurdi B. N., Hwang H., Experimental demonstration and tolerancing of a large-scale neural network (165 000 Synapses) using phase-change memory as the synaptic weight element. IEEE Trans. Electron Devices 62, 3498–3507 (2015). [Google Scholar]
- 19.Li C., Belkin D., Li Y., Yan P., Hu M., Ge N., Jiang H., Montgomery E., Lin P., Wang Z., Song W., Strachan J. P., Barnell M., Wu Q., Williams R. S., Yang J. J., Xia Q., Efficient and self-adaptive in-situ learning in multilayer memristor neural networks. Nat. Commun. 9, 2385 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lee K. H., Kwon D., Woo S. Y., Ko J. H., Choi W. Y., Park B. G., Lee J. H., Highly linear analog spike processing block integrated with an AND-type flash array and CMOS neuron circuits. IEEE Trans. Electron. Devices 69, 6065–6071 (2022). [Google Scholar]
- 21.Dong Z., Zhou Z., Li Z., Liu C., Huang P., Liu L., Liu X., Kang J., Convolutional neural networks based on RRAM devices for image recognition and online learning tasks. IEEE Trans. Electron Devices 66, 793–801 (2019). [Google Scholar]
- 22.M. R. Mahmoodi, D. Strukov, An ultra-low energy internally analog externally digital vector-matrix multiplier based on NOR flash memory technology, in Proceedings of the 55th Annual Design Automation Conference (Association for Computing Machinery, 2018). [Google Scholar]
- 23.Giordano M., Cristiano G., Ishibashi K., Ambrogio S., Tsai H., Burr G. W., Narayanan P., Analog-to-digital conversion with reconfigurable function mapping for neural networks activation function acceleration. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 367–376 (2019). [Google Scholar]
- 24.Wang Z., Joshi S., Savel’Ev S., Song W., Midya R., Li Y., Rao M., Yan P., Asapu S., Zhuo Y., Jiang H., Lin P., Li C., Yoon J. H., Upadhyay N. K., Zhang J., Hu M., Strachan J. P., Barnell M., Wu Q., Wu H., Williams R. S., Xia Q., Yang J. J., Fully memristive neural networks for pattern classification with unsupervised learning. Nat. Electron. 1, 137–145 (2018). [Google Scholar]
- 25.Jang Y. H., Kim W., Kim J., Woo K. S., Lee H. J., Jeon J. W., Shim S. K., Han J., Hwang C. S., Time-varying data processing with nonvolatile memristor-based temporal kernel. Nat. Commun. 12, 1–9 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Kim H., Mahmoodi M. R., Nili H., Strukov D. B., 4K-memristor analog-grade passive crossbar circuit. Nat. Commun. 12, 1–11 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yao P., Wu H., Gao B., Tang J., Zhang Q., Zhang W., Yang J. J., Qian H., Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020). [DOI] [PubMed] [Google Scholar]
- 28.Dalgaty T., Castellani N., Turck C., Harabi K. E., Querlioz D., Vianello E., In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling. Nat. Electron 4, 151–161 (2021). [Google Scholar]
- 29.Narayanan P., Ambrogio S., Okazaki A., Hosokawa K., Tsai H., Nomura A., Yasuda T., Mackin C., Lewis S. C., Friz A., Ishii M., Kohda Y., Mori H., Spoon K., Khaddam-Aljameh R., Saulnier N., Bergendahl M., Demarest J., Brew K. W., Chan V., Choi S., Ok I., Ahsan I., Lie F. L., Haensch W., Narayanan V., Burr G. W., Fully on-chip MAC at 14nm enabled by accurate row-wise programming of PCM-based weights and parallel vector-transport in duration-format. IEEE Trans. Electron Devices 68, 6629–6636 (2021). [Google Scholar]
- 30.J. R. Jameson, P. Blanchard, C. Cheng, J. Dinh, A. Gallo, V. Gopalakrishnan, C. Gopalan, B. Guichet, S. Hsu, D. Kamalanathan, D. Kim, F. Koushan, M. Kwan, K. Law, D. Lewis, Y. Ma, V. McCaffrey, S. Park, S. Puthenthermadam, E. Runnion, J. Sanchez, J. Shields, K. Tsai, A. Tysdal, D. Wang, R. Williams, M. N. Kozicki, J. Wang, V. Gopinath, S. Hollmer, M. van Buskirk, in Conductive-bridge memory (CBRAM) with excellent high-temperature retention, in Technical Digest—International Electron Devices Meeting, IEDM (IEEE, 2013). [Google Scholar]
- 31.Woo J., Moon K., Song J., Lee S., Kwak M., Park J., Hwang H., Improved synaptic behavior under identical pulses using AlOx/HfO2Bilayer RRAM array for neuromorphic systems. IEEE Electron Device Lett. 37, 994–997 (2016). [Google Scholar]
- 32.Jo S. H., Chang T., Ebong I., Bhadviya B. B., Mazumder P., Lu W., Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett. 10, 1297–1301 (2010). [DOI] [PubMed] [Google Scholar]
- 33.Wu H., Wang X. H., Gao B., Deng N., Lu Z., Haukness B., Bronner G., Qian H., Resistive random access memory for future information processing system. Proc. IEEE 105, 1770–1789 (2017). [Google Scholar]
- 34.Yoon J. H., Zhang J., Ren X., Wang Z., Wu H., Li Z., Barnell M., Wu Q., Lauhon L. J., Xia Q., Yang J. J., Truly electroforming-free and low-energy memristors with preconditioned conductive tunneling paths. Adv. Funct. Mater. 27, 1702010 (2017). [Google Scholar]
- 35.Zhou J., Kim K. H., Lu W., Crossbar RRAM arrays: Selector device requirements during read operation. IEEE Trans. Electron Devices 61, 1369–1376 (2014). [Google Scholar]
- 36.Choi B. J., Zhang J., Norris K., Gibson G., Kim K. M., Jackson W., Zhang M. X. M., Li Z., Yang J. J., Williams R. S., Trilayer tunnel selectors for memristor memory cells. Adv. Mater. 28, 356–362 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Bae W., Yoon K. J., Hwang C. S., Jeong D. K., A crossbar resistance switching memory readout scheme with sneak current cancellation based on a two-port current-mode sensing. Nanotechnology 27, 485201 (2016). [DOI] [PubMed] [Google Scholar]
- 38.Yu C., Yoo T., Chai K. T. C., Kim T. T. H., Kim B., A 65-nm 8T SRAM compute-in-memory macro with column ADCs for processing neural networks. IEEE J. Solid-State Circuits 57, 3466–3476 (2022). [Google Scholar]
- 39.Yin S., Sun X., Yu S., Seo J. S., High-throughput in-memory computing for binary deep neural networks with monolithically integrated RRAM and 90-nm CMOS. IEEE Trans. Electron Devices 67, 4185–4192 (2020). [Google Scholar]
- 40.Woo S. Y., Kwon D., Choi N., Kang W. M., Seo Y. T., Park M. K., Bae J. H., Park B. G., Lee J. H., Low-power and high-density neuron device for simultaneous processing of excitatory and inhibitory signals in neuromorphic systems. IEEE Access 8, 202639–202647 (2020). [Google Scholar]
- 41.Seo Y. T., Kwon D., Noh Y., Lee S., Park M. K., Woo S. Y., Park B. G., Lee J. H., 3-D AND-type flash memory architecture with high-κ gate dielectric for high-density synaptic devices. IEEE Trans. Electron Devices 68, 3801–3806 (2021). [Google Scholar]
- 42.Lee S., Kim H., Lee S. -T., Park B. -G., Lee J. -H., SiO2 fin-based flash synaptic cells in AND array architecture for binary neural networks. IEEE Electron Device Lett. 43, 142–145 (2022). [Google Scholar]
- 43.H.-T. Lue, W. Chen, H.-S. Chang, K.-C. Wang, C.-Y. Lu, A novel 3D AND-type NVM architecture capable of high-density, low-power in-memory sum-of-product computation for artificial intelligence application, in Proceedings of the IEEE Symposium on VLSI Technology (IEEE, 2018) pp. 177–178. [Google Scholar]
- 44.A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, V. Srikumar, ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars, in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2016). [Google Scholar]
- 45.Lee S. T., Kwon D., Kim H., Yoo H., Lee J. H., NAND flash based novel synaptic architecture for highly robust and high-density quantized neural networks with binary neuron activation of (1, 0). IEEE Access 8, 114330–114339 (2020). [Google Scholar]
- 46.S.-H. Park, D. Kwon, H.-N. Yoo, J.-W. Back, J. Hwang, Y. Yang, J.-J. Kim, J.-H. Lee, Retention Improvement in Vertical NAND Flash Memory Using 1-bit Soft Erase Scheme and its Effects on Neural Networks, in 2022 International Electron Devices Meeting (IEDM) (IEEE, 2022). [Google Scholar]
- 47.H.-J. Kang, N. Choi, D. H. Lee, T. Lee, S. Chung, J.-H. Bae, B.-G. Park, J.-H. Lee, Space program scheme for 3-D NAND flash memory specialized for the TLC design, in 2018 Symposium on VLSI Technology Digest of Technical Papers (IEEE, 2018).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Figs. S1 to S12
Table S1
References






