Abstract
To circumvent the Von Neumann bottleneck, significant progress has been made towards in-memory computing with synaptic devices. However, compact nanodevices implementing nonlinear activation functions are required for efficient full-hardware implementation of deep neural networks. Here, we present an energy-efficient and compact Mott activation neuron based on vanadium dioxide and its successful integration with a CBRAM crossbar array in hardware. The Mott activation neuron implements the rectified linear unit function in the analogue domain. The neuron devices consume substantially less energy and occupy two orders of magnitude smaller area than those of analogue CMOS implementations. LeNet-5 network with Mott activation neurons achieves 98.38 % accuracy on MNIST dataset, close to the ideal software accuracy. We perform large-scale image edge detection using the Mott activation neurons integrated with a CBRAM crossbar array. Our findings provide a solution toward large-scale, highly parallel, and energy-efficient in-memory computing systems for neural networks.
As the amount of data for computing exponentially increases, data transfer between memory and processor turns into a major bottleneck dominating the system-level energy consumption. In-memory computing has been proposed to circumvent this bottleneck arising from Von Neumann architecture by minimizing or eliminating the energy-consuming data transfer between memory and processor1, 2. In-memory computing with emerging non-volatile memories (eNVMs)3, 4, 5, 6 has shown promising results for on-chip storage of weights and computation of multiply-accumulate (MAC) operations for a single layer7, 8, 9. However, modern deep neural networks (DNNs) consist of hundreds of layers (e.g. ResNet has 152 layers10) that the outputs of each layer are individually connected to artificial neurons applying non-linear activation functions on weighted sums. Most in-memory computing approaches using eNVMs still rely on general processors to compute and propagate activation functions of each layer. However, activations that move in and out of the memory can dominate energy consumption of in-memory computing based accelerators8, 11, 12, 13. Moreover, computation of one element of activation using analogue-to-digital converters (ADC) consumes energy comparable to the energy consumed by a whole synaptic array for a MAC operation13. Since DNNs need to have a very large number of activations to achieve high accuracy13, it is critical to develop energy and area efficient implementations of activation functions, which can be integrated on the periphery of the synaptic arrays. Recent works have investigated analogue CMOS circuits14 and ADCs with reconfigurable function mapping15 for the implementation of nonlinear activation functions. However, a compact and energy-efficient nanodevice implementing the nonlinear activation functions has yet to be demonstrated.
Here we propose a volatile four-terminal Mott activation neuron device based on vanadium dioxide (VO2) for compact and energy-efficient implementation of activation functions. The Mott activation neuron consists of a nanowire heater for precise control of the temperature of the VO2 film. First, we experimentally demonstrate that the resistance of the Mott activation neuron can be switched linearly and gradually to emulate rectified linear unit (ReLU) activation function, which is the most widely used activation function. The Mott activation neuron can generate an output voltage, which follows the ReLU activation function for a given weighted sum current. Then, we study the energy efficiency of the Mott activation neuron in comparison to activation function circuits with analogue CMOS14 or reconfigurable digital ADC15. We investigate the performance of hardware neural networks implemented with the Mott activation neurons in terms of energy, latency, peripheral neuron/circuit area, and classification accuracy. Lastly, we fabricate CBRAM crossbar arrays and Mott activation neuron arrays to demonstrate edge detection using convolutional neural networks in hardware. Our results show that the small size and energy efficiency of the Mott activation neuron enable direct stacking of synaptic layers in neural networks and achieve substantial gains in energy efficiency and area while providing high accuracy.
Mott Activation Neuron
Neural networks consist of a set of neurons organized in layers, connected with synaptic weights (Fig. 1a). The inputs applied to the networks are multiplied by the corresponding weights and the multiplication results are accumulated in neurons. Then, the output of a neuron is calculated by passing the MAC results through a nonlinear activation function. In-memory computing architectures map these neural network operations onto the arrays of eNVM devices. The weights are stored in arrays of eNVM devices and the weighted sum is calculated using Kirchhoff’s current law16. While in-memory computing allows the local storage of the weights in compact and energy-efficient synaptic devices, the activation function calculations are still implemented with general processors or large and complex neuron peripheral circuits (Fig. 1b). It significantly degrades energy and area efficiency at the system-level. The activation function we target is the ReLU, which is the most widely used activation function. The output of ReLU activation function (i.e. f(x) = max(0, x)) depends only on current input regardless of previous inputs and resistance states (i.e. non-causal function). In addition, the output of the ReLU function is linear after the transition point (i.e. x = 0). In order to emulate the ReLU activation function, the device should exhibit volatile, linear, and gradual resistive switching. We developed a four-terminal VO2-based activation device (illustrated in the inset of Fig. 1b on the bottom) that exploits a thermal driven Mott transition of VO2 to embody these characteristics in a single nanodevice. The Mott ReLU device uses a nanowire heater (i.e. Ti (20 nm)/Au (30 nm)) to control the resistive switching of a lateral 50 nm VO2 gap beneath it. The heater and the VO2 gap are electrically insulated by 70 nm of Al2O3 layer. A scanning electron microscope (SEM) image of a fabricated device is shown in Fig. 1c and detailed fabrication procedures are discussed in Methods. The heater generates heat through Joule-heating depending on the magnitude of the weighted sum current generated by each column of the eNVM array. Then, the generated heat is transferred to the VO2 film through the electrical insulator (i.e. Al2O3 layer) and induces the phase transition from the insulating states to the metallic states, which results in a resistivity drop. The temperature-dependent resistance of the VO2 gap is shown in Fig. 1d. To map the gradual resistivity changes of the VO2 gap onto the output voltage (VOUT), a voltage divider circuit is implemented as illustrated in the inset of Fig. 1e. The supply voltage (VDD) is divided into the voltage drop across the VO2 gap and the load resistor depending on the resistance ratio of the VO2 gap and the load resistor. As the resistance of the VO2 gap decreases, the voltage drop across the VO2 gap decreases which results in the increment of the output voltage (or the voltage drop across the load resistor). As a result, the resistive switching of the VO2 gap allows the output voltage to emulate ReLU activation function as illustrated in Fig. 1e. Since the output of the Mott ReLU device is voltage, it can be directly applied to the next layer as an input voltage. Therefore, multiple synaptic layers can be directly stacked on each other for driving the next layers by eliminating complex digital circuits and ADCs between the layers. Moreover, the small size of the Mott ReLU device allows the integration of the device for each column of the synaptic array, which eliminates the need for time-multiplexing and hence, enables fully parallel operations.
Fig. 1 |. The Mott ReLU device for the hardware implementation of a neural network.

An illustration shows a a neural network and b hardware implementation of the neural network with synaptic and activation (or neuron) devices. The callout window on the left shows a schematic of a resistive memory cell. The call-out window on the right shows a schematic of the Mott device with a nanowire heater. Mott activation devices allow direct stacking of multiple eNVM arrays for DNNs. The heater is connected to a column of pre-synaptic array and gets weighted sum current. Then, one pad of the VO2 gap is connected to VDD and the other pad is connected to the next synaptic array. The pad connected to the next synaptic array is also connected to a load resistor. Weights are stored in eNVM devices and weighted sum currents from each column are fed into Mott ReLU. Then, the output of the Mott ReLU is applied as the input to the next layer. c A SEM image of the Mott device (1 μm scale bar). The inset shows the nanowire heater on the top of the 50 nm VO2 gap. d Resistance of the VO2 gap when the temperature is swept from 280K to 365K. e An illustration shows how a Mott device will be used as a ReLU activation function.
The main operating principle of the Mott ReLU device is the Mott transition (or insulator-to-metal transition) of the VO2 gap. The Mott transition of the VO2 gap can be induced either by electrical filamentary switching or thermal-driven domain-wise switching17, 18. When a voltage bias above the threshold is applied across the VO2 gap, Joule-heating due to the bias induces filament formation, and the filament is widened as the voltage increases (Fig. 2a and b). Since the filament formation is a cascading avalanche effect, the resistance switching is abrupt19. In contrast, when the transition is driven by temperature, only the domains whose critical temperature is below the device temperature transit to the metallic phase (Fig. 2a and c). Since the transition temperature of each domain exhibits variations20, the number of domains switched to metallic phases gradually increases as the temperature increases. As a result, the resistance of VO2 gradually decreases as the temperature increases (Fig. 2d). This gradual switching behaviour of VO2 was confirmed by scanning microwave microscopy (SMM) imaging of VO2 film previously20. The Mott ReLU device is engineered to exploit this thermal-driven linear resistive switching for emulating the linear increment of the ReLU activation function, as shown in Fig. 2e. Then, this linear resistive switching of the VO2 gap is projected to the output voltage. The ratio between VOUT and VDD of the Mott ReLU device with a 1,900 Ω load resistor is demonstrated in Fig. 2f as a representative example. Two potential practical issues regarding the Mott transition are discussed in Supplementary Note 1.
Fig. 2 |. Switching mechanisms of VO2 gap.

Schematics show a VO2 gap with a no bias, b filamentary switching, and c thermal-driven switching. d As compared to thermal-driven domain-wise switching, electrical filamentary switching shows an abrupt change in resistance. e Resistance of the VO2 device as a function of heater current showing ~77 levels. It shows gradual and linear resistive switching when the input current is larger than 5 mA. f Voltage ratio between the output and the total voltage (VDD) of the device as a function of heater current with a 1,900 Ω load resistor as a representative example. Symbols are experimental data and lines are SPICE simulation results. The V-I characteristic is similar to the ReLU function shown in Fig. 1e.
To further assess the compatibility of the Mott ReLU device for implementing the ReLU activation function, we extensively characterized its switching characteristics. In addition to gradual switching, the resistive switching should be volatile to implement ReLU function in synaptic arrays. That is because the output of the ReLU activation function should only depend on current input regardless of previous inputs and resistance states. The volatile switching of the Mott ReLU device is experimentally verified in Fig. 3a. When 1 ms wide current pulses with various amplitudes are applied to the heater, the resistance of the device is switched and maintained only when the current pulse is high. Furthermore, the output voltage for a given input current should not exhibit a high level of variations, which could degrade neural network performance. Figure 3b demonstrates that each resistance state of the device shows only ~4% or less variation when the resistance states are iteratively measured. The impact of this small variation on the neural network performance is studied in Neural Network Implementations Section. Lastly, the endurance of the device should be high to allow a large number of weighted sum operations in hardware. For the inference with MNIST dataset21, each Mott ReLU device should generate its output for 10,000 times per epoch (or whole testing set). Hence, the device should endure this large number of cycling operations. Figure 3c experimentally demonstrates that the Mott ReLU device shows no sign of ON/OFF ratio degradation up to 5,000 cycles. It has been shown that endurance larger than 1010 cycles can be easily achieved with VO2 devices22. Furthermore, we performed pulse measurements to investigate the power consumption and the latency of the Mott ReLU device. Figure 3d shows the total power consumption as a function of heater current, as well as the power consumed by the heater and the VO2 gap separately. The total power consumption of the Mott ReLU device is dominated by the heater. The latency of the Mott ReLU is 61.4 ns, measured as the time difference between the first saturation point of the input and output pulse (Figure 3e). The energy consumption of the Mott ReLU is 199.5 pJ for 65 ns pulse width.
Fig. 3 |. Electrical characteristics of the Mott ReLU device.

a Resistance of the VO2 gap when a current pulse is applied to the heater. The resistance stays at a low resistance state only when the bias is applied. b Cycle-to-cycle (or intra-device) variation of each resistance state of Mott ReLU device. For each data point, the heater cools down to set the resistance of the VO2 gap back to no bias case before applying another bias to the heater. The circle symbol represents mean value while the error bars represent 95% critical interval (CI). c Endurance of the Mott ReLU device. The state of the device is alternately switched between the highest and lowest resistance states by flowing 0 mA and 18 mA current through the heater, respectively. d Power consumption of each component of the Mott device (i.e. the heater and the VO2 gap) with various heater current. The power consumption of the Mott device is dominated by the heater. e Heater current applied to the device and the resistance of VO2 gap as a function of time. After 61.4 ns the output of the Mott device becomes stable.
The Mott ReLU device can replace complex peripheral circuits for activation function calculation. Therefore, it is important to compare the performance of the Mott ReLU device against other implementations of activation functions (i.e. analogue CMOS14 and digital ADC15 circuits discussed in Methods). The performance benchmarking (Supplementary Note 2) results of the Mott ReLU device against the analogue CMOS circuit14 and the digital ADC implementation15 are summarized in Table 1. The energy consumption of Mott ReLU can be further reduced by optimizing the device to have more heat confinement on the VO2 gap. As the heat generated by the heater is more confined to the VO2 gap, the device requires less heater current to achieve the same temperature on the VO2 gap23. Therefore, by replacing the heater material with a higher thermal resistance material (e.g. Ti has thermal resistance ~10× higher than that of Au), the energy consumption of the device can be lowered. To determine the energy consumption of an optimized device, we developed an empirical thermal model of our device (Supplementary Fig. 1a and b) as discussed in Supplementary Note 3, which shows good agreement with experimental data as shown in Fig. 2e and f. The power consumption of Mott ReLU can be reduced by ~25× down to 128 μW by increasing the thermal resistance of the nanowire heater (Supplementary Fig. 2a). Moreover, the latency can be reduced to ~3.8 ns (Table 1) by minimizing the parasitic capacitance of the Mott ReLU below 10−11 F (Supplementary Fig. 2b), which would result in a total reduction of ~300× in energy consumption down to 0.638 pJ (Table 1). Our experimental results show that the Mott ReLU device achieves 450–1500× improvement in area and 1.5–3× improvement in latency while achieving low energy consumption. Moreover, the optimization of the Mott ReLU device can further reduce the energy consumption and improve the latency offering substantial gains in area, latency, and energy efficiency as a replacement to the analogue CMOS14 and digital15 ADC circuits.
Table 1. The Performance of Activation Device/Circuit.
Comparison of Mott ReLU, analogue CMOS ReLU14, and digital ADC with reconfigurable function mapping15 at single ReLU level. The energy, latency, and leakage power are evaluated from the experimental measurement results shown in Supplementary Fig. 2a and b. For the energy estimation, we used a 65 ns pulse for the Mott ReLU case.
| Mott | Analogue CMOS14 | Digital ADC15 | |
|---|---|---|---|
| Energy (Exp./Optimal, pJ) | 199.5/0.638* | 3410 | 19.4 |
| Latency (Exp./Optimal, ns) | 61.4/3.8* | 91.91 | 207 |
| Area (μm2) | 0.64 | 951.06 | 289** |
| Leakage (μW) | 27.0 | 11060 | - |
Shows projected optimal energy and latency when the thermal resistance of the heater is increased by 10× and the parasitic capacitance of a Mott ReLU is < 10−11 F.
This area is only the area per neuron circuit. The digital ADC implementation needs a shared circuit which occupies 0.086 mm2 of area.
Neural Network Implementations
We have demonstrated that the Mott ReLU neurons can provide smaller area and better energy efficiency as compared to the other circuit implementations. It is also critical to evaluate the network-level performance using the Mott ReLU devices for hardware implementation of DNNs. To compute the accuracy of neural network implementations with the Mott ReLU device, we simulated MLP (Fig. 4a) and LeNet-521 (Fig. 4b) (The details on the configurations of the networks are discussed in Methods). The schematic and transmission electron microscopy (TEM) image of the CBRAM cell (Supplementary Fig. 3a) are shown in Supplementary Fig. 3b and c, respectively. Table 2 summarizes the accuracy results of ideal (i.e. software ReLU) and Mott ReLU cases for both MLP and LeNet-5. We investigated both online learning (i.e. training is done on the hardware) and offline classification cases (i.e. only inference is done on the hardware). When the ReLU activation functions of MLP (Fig. 4c) or LeNet-5 (Fig. 4d) are quantized, the accuracy degradation is not significant unless the precision is ~6 bit or higher. Since the precision of the Mott ReLU device is high enough (~6 bit), the accuracy degradation due to the Mott ReLU is negligible as compared to the accuracy degradation due to the synaptic devices (~10 % for MLP and ~3 % for LeNet-5). This is mainly because of the limited precision (~5-bit) of the CBRAM devices24. The neural networks with variations (cycle-to-cycle in Fig. 4e and f and device-to-device Supplementary Fig. 4 a and b) on Mott ReLU are also investigated and verified that there is no significant accuracy degradation due to the variations (Supplementary Note 4). Since the Mott ReLU achieves accuracies close to the ideal software, the accuracy will not be a limiting factor for implementing activation functions using the Mott ReLU device.
Fig. 4 |. Network-level implementations.

A schematic of a MLP and b LeNet-5 networks used for simulations with Mott ReLU. MLP has one ReLU layer and LeNet-5 has four ReLU layers after convolutional (Conv) and fully connected (FC) layers. c MLP and d LeNet-5 with the ReLU activation function for offline classification (blue circle symbol) and online learning (red square symbol). The ReLU activation function is quantized to have 1–8 bit precision. MLP needs 5-bit precision while LeNet-5 requires 6-bit precision to prevent the accuracy drop. The network simulation results for e MLP and f LeNet-5 for the whole MNIST set for each epoch (60k images). Experimental measurement results from Fig. 2e and f are used for these simulations. The Mott ReLU achieves accuracy comparable to the ideal ReLU implemented in software unless (blue square symbol) the cycle-to-cycle (or intra-device) variation of Mott ReLU device (σ) is higher than 50 %. Red triangle, yellow diamond, green triangle, and purple triangle symbol represent results for no variation, σ = 10 %, σ = 30 %, and σ = 50 % cases, respectively.
Table 2. Network Simulation Results.
The accuracy results of MLP and LeNet-5 for ideal software (64-bit), 64-bit weights with Mott ReLU (~6-bit), and CBRAM (~5-bit weights) with Mott ReLU (~6-bit). The results show that Mott ReLU can achieve accuracy comparable to the ideal software ReLU.
| Online Learning | Offline Classification | |||
|---|---|---|---|---|
| MLP | LeNet-5 | MLP | LeNet-5 | |
| Software (64-bit) | 97.53% | 99.11% | 97.53% | 99.11% |
| Mott ReLU (~6-bit) | 94.0% | 97.05% | 94.42% | 98.38% |
| CBRAM (~5-bit) + Mott ReLU (~6-bit) | 84.2 % | 94.21% | 89.97% | 98.35% |
System-level Performance Benchmarking
To evaluate the performance of the hardware system for neural networks with the Mott ReLU device, we performed system-level performance benchmarking for offline classification using NeuroSim platform25. NeuroSim is a C++-based circuit-level macro-model for neuro-inspired architectures. We modified NeuroSim to integrate Mott ReLU peripherals with CBRAM synaptic cores. We compared the synaptic cores with the Mott ReLU peripheral against the ones with peripheral circuits implemented by analogue14 and digital15 CMOS ReLU circuits. For the Mott ReLU peripheral, the experimental results on energy and latency (Fig. 3d and e) are integrated into the NeuroSim Platform25. The peripheral circuits of analogue CMOS ReLU circuits for NeuroSim platform25 is developed based on the SPICE simulations. The dynamic energy, leakage power, and latency of Mott ReLU and CMOS ReLU activation circuit shown in Table 1 are integrated into the circuit modules.
The architecture of the hardware systems with conventional digital peripheral circuits, the Mott ReLU device, and analogue CMOS circuits are illustrated in Fig. 5a, b, and c, respectively. In contrast to the conventional analogue one-transistor one-resistor (1T1R) architecture with digital neuron peripheral (Fig. 5a)25, the Mott ReLU device allows a simpler synaptic core design (Fig. 5b) by avoiding MUX sharing (Supplementary Note 5) and replacing complex circuits and ADCs. Before system-level benchmarking, we first investigated whether the Mott ReLU device can drive the inputs to the next synaptic array without additional circuits by performing circuit simulation with SPICE (Supplementary Note 6). This result (Supplementary Fig. 5a and b) clearly demonstrates that the Mott ReLU device can generate stable output to drive the next synaptic layer without additional circuits. The system-level performance benchmarking results are summarized in Supplementary Table 1. The architecture with Mott ReLU (65 ns input pulse) provides substantial gains over the architectures with analogue CMOS and digital ADC implementation (Supplementary Note 7). Lastly, we compared the performance of synaptic cores with Mott ReLU and analogue CMOS circuits considering technology scaling (130 nm to 14 nm) as discussed in Methods. The results in Fig. 5d demonstrate that the experimentally measured Mott ReLU provides ~10× energy gain regardless of the CMOS technology node. Moreover, the system-level gain in energy can be further improved up to ~100× using the optimized Mott ReLU in comparison to analogue CMOS ReLU. More importantly, the Mott ReLU achieves orders of magnitude smaller peripheral circuit area in comparison to both digital ADC and analogue CMOS implementations of the activation function. The system-level performance results clearly show that the Mott ReLU device offers a promising approach to replace power-hungry and large-area activation function circuits in the neuron periphery.
Fig. 5 |. System-level benchmarking results.

An illustration of a synaptic core and neuron peripheral circuits implemented with a conventional digital circuits, b Mott ReLU circuits, and c analogue CMOS ReLU circuits. The Mott ReLU device can replace ADCs and neuron peripheral circuits. In contrast to CMOS analogue activation circuit, the Mott ReLU device can be integrated for each column due to its small size. d Peripheral energy vs. peripheral area in different technology nodes for CBRAM synaptic core with CMOS ReLU peripheral for LeNet-5 implementation. A CBRAM synaptic core with digital ADC peripherals and Mott ReLU is also presented as a reference. The parameters for different technology nodes of CMOS circuits are adopted from the predictive technology model (PTM)27, 28. Mott ReLU continues to provide significant gains in energy and area even though CMOS is scaled down to 14 nm node. Star symbol shows performance results using experimentally measured Mott ReLU characteristics while black square symbol shows projected performance results using an optimized Mott ReLU device (i.e. thermal resistance of the heater is increased by 10× and the parasitic capacitance is below 10−11 F). The system-level energy consumption using the optimized Mott ReLU can be further reduced by ~50×.
Integration of Mott ReLU Devices with Crossbar Arrays
To demonstrate the integration of Mott ReLU devices with synaptic arrays in hardware, we fabricated CBRAM crossbar arrays (Fig. 6a) and a Mott ReLU device array (Fig. 6b) as explained in Methods. We designed a custom printed circuit board (PCB) (Fig. 6c) to interface and integrate the CBRAM and the Mott ReLU chips in hardware. Each column of the crossbar array is directly connected to Mott ReLU devices (Fig. 6d) to investigate how the weighted sum current generated by the array controls the output voltage of the Mott ReLU devices. First, we varied the input voltage to the crossbar array (i.e. −250 to 250 mV) while programming the weights of ~2/3 of synaptic devices on a column to the low resistance state and the rest to the high resistance state. Figure 6e shows that output voltage exhibits ReLU characteristics as the input voltage to the CBRAM devices are increased from −250 mV to 250 mV. Then, we varied the synaptic weights in the column while the input voltage was fixed to 130 mV. As the ratio of devices programmed to the low resistance state increases, the output voltage exhibits ReLU characteristics (Fig. 6f). These experimental results demonstrate that the weighted sum current that depends on the input voltage and the weights (resistance) of the synaptic devices can successfully drive the Mott ReLU neurons to implement ReLU activation function.
Fig. 6 |. Hardware demonstration of the integration of Mott ReLU devices and a synaptic array.

a An optical image (150 μm scale bar) of a CBRAM crossbar array (32 × 32) and the SEM image of a 16 × 16 CBRAM array (200 μm scale bar). We use 16 × 16 array for the following hardware implementation. b A Mott ReLU device array which contains 44 devices (3 mm scale bar). The insets in a and b (20 μm and 30 μm scale bar, respectively) show single devices, CBRAM and Mott ReLU device, respectively. c Image of the custom PCB board with the Mott ReLU and the CBRAM arrays wire bonded onto it to demonstrate neural network operation. d An illustration explains how a Mott ReLU device is connected to a column of the CBRAM array in hardware. e Output voltage of the Mott ReLU device as the input voltage to the CBRAM array is swept from −250 mV to 250 mV when ~2/3 of devices on a column of the CBRAM array are set to a low resistance state while the others are set to a high resistance state. For the Mott ReLU device, 1.1 V is applied as VDD to the VO2 Gap with a 3.3 kΩ load resistor connected in series and 7 mA of offset current is applied to the heater. f Measured output voltage of a Mott ReLU device when the percentage of CBRAM devices at the low resistance state is varied from 0% to 100%. g 180 × 270 image used for edge detection. Colour bar represents the pixel intensity of the image. Four representative 10 × 10 patches and a schematic of the convolution operation are shown below. The schematic illustrates the convolution operation is done by sliding the 4 × 4 filters on the image patches 49 times. For h lateral filter and i vertical filter, the experimentally measured weighted sum currents of the CBRAM array during the convolution operations for these four patches are shown. The weighted sum current produced by CBRAM array during convolution operation is fed to the Mott ReLU array to perform ReLU operation. j and k show the output voltage of the Mott ReLU devices for the whole image during the convolution and ReLU operations for the lateral and vertical filters, respectively. Colour bar represents the output voltage of Mott ReLU.
For large-scale hardware demonstration, we implemented a convolutional edge detection operation26 with filters (Supplementary Fig. 6a and b) followed by a ReLU operation on a real-world image with the CBRAM crossbar and the Mott ReLU array using the custom PCB as discussed in Methods. The weighted sum current resulting from the convolution operation from four representative 10 × 10 input patches (Fig. 6g) with both the lateral and vertical edge detection filters (Supplementary Fig. 6a and b) mapped using differential pair scheme (Supplementary Fig. 6c) are shown in Fig. 6h and i, respectively. The weighted sum current generated during the convolution operation is fed to the Mott ReLU devices to perform ReLU operation on the weighted sum. The output voltages of the Mott ReLU devices as a result of weighted sum with the lateral and vertical filters for the whole input image are shown in Fig. 6j and k, respectively. These results show that lateral and vertical edges of the image are detected by implementing corresponding filters using the Mott ReLU devices integrated with the CBRAM crossbar array in hardware. The successful edge detection using the Mott ReLU devices integrated with the CBRAM crossbar array proves the feasibility of using Mott ReLU neurons as activation units for in-memory computing systems.
Conclusions
We introduced a nanoscale Mott transition based device for the ReLU activation function. The device exhibits volatile, linear, and gradual resistive switching of VO2 film controlled by the metal nanowire heater on top of it. The Mott ReLU device shows minimal cycle-to-cycle variation and long endurance, which are important for hardware implementation of neural networks. We have shown that the Mott ReLU devices generate an output voltage, which follows the ReLU activation function, with the given input current. It allows the Mott ReLU device to drive the synaptic devices on the next layer directly. We performed system-level simulations for hardware implementation of neural networks with the Mott ReLU devices. Moreover, we experimentally demonstrated that the Mott ReLU devices can be integrated with CBRAM crossbar arrays to perform filtering operations of convolutional neural networks. Our findings suggest that the Mott transition based activation device can achieve substantial gains in energy, latency, and area compared to the digital or analogue circuit implementations of activation function while maintaining high accuracy. The small size and high energy efficiency of the Mott device provide a solution towards large-scale, highly parallel, and energy-efficient in-memory computing systems for DNNs.
Code availability
The software codes used for this study are available from the corresponding author upon request.
Methods
Mott Device Fabrication
To fabricate the Mott transition based activation devices, 70nm VO2 film is grown by reactive sputtering on top of an Al2O3 substrate in 4 mTorr Ar/O2 (8% of O2) ambient at 520°C. Then Ti (20 nm)/Au (30 nm) electrodes are patterned using e-beam lithography and e-beam evaporation to define the 50 nm VO2 gap. 70 nm Al2O3 is deposited as the insulating layer. Ti (20 nm)/Au (30 nm) nanowire heater is patterned on top of the Al2O3 aligning with the VO2 gap using e-beam lithography and e-beam evaporation. To isolate each device, the VO2 film outside the active area is etched with reactive ion etching. The resistance of the heater is ~30 Ω while the resistance of the VO2 gap without bias to the heater is ~10 kΩ.
Device Measurement Setup
To measure thermal gradual resistance switching of VO2 while preventing electrical switching, we applied 1 μA current source using Keithley 6221 to the VO2 gap. The current is small enough not to initiate electrical switching. Then, we measured the voltage across the VO2 gap using Keithley 2182A. The resistive switching of the VO2 gap is solely controlled by the heat generated by the heater on the top of the VO2 gap. The heat generation is controlled by a voltage source connected to the heater. We measured the current flow through the heater to measure the heat generation using an oscilloscope. For variability and endurance measurement, Keithley 6221 is used to apply a current pulse train to the heater. Then the resistance of the VO2 gap is extracted by measuring the voltage across the VO2 gap using Keithley 2182A while applying constant 1 μA current through the gap using another Keithley 6221. The ambient temperature is controlled by Lake Shore TTPX Probe station for all the measurements.
CMOS ReLU Implementation
The analogue CMOS circuit consists of three operational amplifiers (OP-AMPs), which amplify the input current and convert the input current to the output voltage, and an analogue switch that implements rectifying function. The digital ADC circuit is implemented using ADC with reconfigurable function mapping. In order to evaluate the energy and latency of these three different ReLU implementations as an activation function, we assume that all implementations get an identical weighted sum result as an input to the Mott ReLU device or digital/analogue CMOS circuits. The area of each implementation is calculated from the layout of the device or circuits.
Neural Network Configuration
The MLP used for network simulations is composed of 785 input neurons (i.e. 1 input neuron for bias and the other 784 neurons for MNIST dataset inputs), 128 hidden neurons, and 10 output neurons. Each output neuron represents one of the digits (from ‘0’ to ‘9’). The hidden neurons have the ReLU activation function while the output neurons have the soft-max activation function. The LeNet-5 has 6 of 5 × 5 convolutional filters for 28 × 28 MNIST input images. The outputs from the convolutional filters are fed to the ReLU activation function. Then, the outputs of ReLU activation functions are down sampled using 2 × 2 max pooling. The second convolutional layer has 16 of 8 × 8 convolutional filters with 2 × 2 max pooling. The outputs from the last max-pooling layer are fed into the FC layers, which has 120 input neurons, 80 hidden neurons, and 10 output neurons. The input neurons and hidden neurons of the FC layers have ReLU activation functions while the output neurons have soft-max activation functions.
In the network simulations, the ReLU activation functions on the neuron layers (i.e. the hidden layer of the MLP and convolutional layers and fully connected (FC) layers of the LeNet-5) are implemented with the Mott ReLU based on its experimental measurement results. A 1,900 Ω load resistor is connected to the Mott ReLU, and 5 mA of offset current is applied to the Mott ReLU through an additional row on the synaptic array to shift the transition point to 0 mA. The weights are mapped onto the arrays of CBRAM devices by using the characteristics of CBRAM devices. The CBRAM cells used for the simulations exhibit ~40 conductance levels (~5-bit) and 100 ON/OFF ratio. For the network simulation, the weights of the network ranging from −1 to 1 are mapped to the minimum (~1 μS) and maximum (~100 μS) conductance of CBRAM cells. Similarly, the outputs of the ReLU activations (0 to 785) are also linearly mapped to the output voltages of Mott ReLU devices (0 to 200 mV).
LeNet-5 requires larger fanout for the FC1 layer. To address this, we incorporated a time multiplexing approach. By enabling a subset of columns of the synaptic array sequentially with the switch matrix, the number of devices connected to each Mott ReLU can be controlled. Since our architecture already has a switch matrix, this approach is directly implemented in performance benchmarking simulations with NeuroSim. It is important to note that larger-scale DNN models may require additional peripheral circuit blocks including buffers if they have many layers with large fanout. These blocks could be integrated with the synaptic arrays in the future and accounted for the performance benchmarking for different models.
Convolutional Filtering with Mott ReLU Device Integrated with CBRAM Array
To implement convolutional filtering using Mott ReLU and CBRAM array for image edge detection, the PCB is controlled by a semiconductor parameter analyser (Agilent 4155C) and a switch matrix (HP E5250A). Then, biasing and measurement are done by the semiconductor parameter analyser (Agilent 4155C). The 4 × 4 lateral and vertical filters are programmed into the columns of the crossbar array by unrolling the filters into 16 × 1 vectors on the CBRAM array. For each filter, the positive and negative weights are represented using two columns of the crossbar array to form a differential pair (i.e. G = G+ - G−). The input image (180 × 270) is quantized (i.e. 16 levels) and converted into a voltage pulse train of 4 binary pulses (i.e. 250 mV for ‘1’ and 0 mV for ‘0’). For the column representing negative weights, a negative voltage pulse train is applied as input to form a differential pair with the column representing positive weights (i.e. I = I+ - I−). For the convolution operation, a filter slides over the input image and the weighted sum currents from the pair are combined and fed into a Mott ReLU device. For Mott ReLU devices, 1.1 V is applied to the VO2 Gap, load resistors are set to 3.3 kΩ, and 7 mA of offset current is applied to the heater.
Supplementary Material
Acknowledgments
This work was supported by Office of Naval Research (N000142012405 and N00014162531), Samsung Electronics, NSF (ECCS-1752241, ECCS-2024776, and ECCS-1734940), NIH (R21 EY029466, R21 EB026180, and DP2 EB030992), and Qualcomm Fellowship. The experimental aspects of this work were supported as part of the Quantum Materials for Energy Efficient Neuromorphic Computing (Q-MEEN-C) Energy Frontier Research Center (EFRC), funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences Under Award #DE-SC0019273. The fabrication of the devices was performed at the San Diego Nanotechnology Infrastructure (SDNI) of UCSD, supported by the National Science Foundation (ECCS-1542148).
Footnotes
Competing financial interests
The authors declare no competing financial interests.
Data availability
The data that support the plots and other results of this paper are available from the corresponding author upon request.
References
- 1.Wong HSP, Lee H-Y, Yu S, Chen Y-S, Wu Y, Chen P-S, Lee B, Chen FT, Tsai M-J. Metal–Oxide RRAM. Proceedings of the IEEE 2012, 100(6): 1951–1970. [Google Scholar]
- 2.Zidan MA, Strachan JP, Lu WD. The future of electronics based on memristive systems. Nature Electronics 2018, 1(1): 22–29. [Google Scholar]
- 3.Kang D-H, Kim J-H, Oh S, Park H-Y, Dugasani SR, Kang B-S, Choi C, Choi R, Lee S, Park SH, Heo K, Park J-H. A Neuromorphic Device Implemented on a Salmon-DNA Electrolyte and its Application to Artificial Neural Networks. Advanced Science 2019, 6(17): 1901265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ge R, Wu X, Kim M, Shi J, Sonde S, Tao L, Zhang Y, Lee JC, Akinwande D. Atomristor: Nonvolatile Resistance Switching in Atomic Sheets of Transition Metal Dichalcogenides. Nano Lett 2018, 18(1): 434–441. [DOI] [PubMed] [Google Scholar]
- 5.van de Burgt Y, Lubberman E, Fuller EJ, Keene ST, Faria GC, Agarwal S, Marinella MJ, Alec Talin A, Salleo A. A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing. Nature Materials 2017, 16: 414. [DOI] [PubMed] [Google Scholar]
- 6.Zhao X, Liu S, Niu J, Liao L, Liu Q, Xiao X, Lv H, Long S, Banerjee W, Li W, Si S, Liu M. Confining Cation Injection to Enhance CBRAM Performance by Nanopore Graphene Layer. Small 2017, 13(35): 1603948. [DOI] [PubMed] [Google Scholar]
- 7.Chakrabarti B, Lastras-Montano MA, Adam G, Prezioso M, Hoskins B, Payvand M, Madhavan A, Ghofrani A, Theogarajan L, Cheng KT, Strukov DB. A multiply-add engine with monolithically integrated 3D memristor crossbar/CMOS hybrid circuit. Sci Rep 2017, 7: 42429. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kim S, Choi B, Yoon J, Lee Y, Kim HD, Kang MH, Choi SJ. Binarized Neural Network with Silicon Nanosheet Synaptic Transistors for Supervised Pattern Classification. Sci Rep 2019, 9(1): 11705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Oh S, Huang Z, Shi Y, Kuzum D. The Impact of Resistance Drift of Phase Change Memory (PCM) Synaptic Devices on Artificial Neural Network Performance. IEEE Electron Device Letters 2019, 40(8): 1325–1328. [Google Scholar]
- 10.He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. pp. 770–778. [Google Scholar]
- 11.Kataeva I, Ohtsuka S, Nili H, Kim H, Isobe Y, Yako K, Strukov D. Towards the Development of Analog Neuromorphic Chip Prototype with 2.4M Integrated Memristors. 2019 IEEE International Symposium on Circuits and Systems (ISCAS); 2019 26–29 May 2019; 2019. p. 1–5. [Google Scholar]
- 12.Gao B, Bi Y, Chen H-Y, Liu R, Huang P, Chen B, Liu L, Liu X, Yu S, Wong HSP, Kang J. Ultra-Low-Energy Three-Dimensional Oxide-Based Electronic Synapses for Implementation of Robust High-Accuracy Neuromorphic Computation Systems. ACS Nano 2014, 8(7): 6998–7004. [DOI] [PubMed] [Google Scholar]
- 13.Yang T-J, Sze V. Design Considerations for Efficient Deep Neural Networks on Processing-in-Memory Accelerators. 2019 IEEE International Electron Devices Meeting (IEDM); 2019 7–11 Dec. 2019; 2019. p. 22.21.21–22.21.24. [Google Scholar]
- 14.Krestinskaya O, Salama KN, James AP. Learning in Memristive Neural Network Architectures Using Analog Backpropagation Circuits. IEEE Transactions on Circuits and Systems I: Regular Papers 2019, 66(2): 719–732. [Google Scholar]
- 15.Giordano M, Cristiano G, Ishibashi K, Ambrogio S, Tsai H, Burr GW, Narayanan P. Analog-to-Digital Conversion With Reconfigurable Function Mapping for Neural Networks Activation Function Acceleration. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 2019, 9(2): 367–376. [Google Scholar]
- 16.Ambrogio S, Narayanan P, Tsai H, Shelby RM, Boybat I, di Nolfo C, Sidler S, Giordano M, Bodini M, Farinha NCP, Killeen B, Cheng C, Jaoudi Y, Burr GW. Equivalent-accuracy accelerated neural-network training using analogue memory. Nature 2018, 558(7708): 60–67. [DOI] [PubMed] [Google Scholar]
- 17.Stefanovich G, Pergament A, Stefanovich D. Electrical switching and Mott transition in VO2. Journal of Physics: Condensed Matter 2000, 12(41): 8837. [Google Scholar]
- 18.Qazilbash MM, Brehm M, Chae BG, Ho PC, Andreev GO, Kim BJ, Yun SJ, Balatsky AV, Maple MB, Keilmann F, Kim HT, Basov DN. Mott transition in VO2 revealed by infrared spectroscopy and nano-imaging. Science 2007, 318(5857): 1750–1753. [DOI] [PubMed] [Google Scholar]
- 19.Del Valle J, Salev P, Tesler F, Vargas NM, Kalcheim Y, Wang P, Trastoy J, Lee MH, Kassabian G, Ramirez JG, Rozenberg MJ, Schuller IK. Subthreshold firing in Mott nanodevices. Nature 2019, 569(7756): 388–392. [DOI] [PubMed] [Google Scholar]
- 20.Madan H, Jerry M, Pogrebnyakov A, Mayer T, Datta S. Quantitative Mapping of Phase Coexistence in Mott-Peierls Insulator during Electronic and Thermally Driven Phase Transition. ACS Nano 2015, 9(2): 2009–2017. [DOI] [PubMed] [Google Scholar]
- 21.LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998, 86(11): 2278–2324. [Google Scholar]
- 22.Radu IP, Govoreanu B, Mertens S, Shi X, Cantoro M, Schaekers M, Jurczak M, De Gendt S, Stesmans A, Kittl JA, Heyns M, Martens K. Switching mechanism in two-terminal vanadium dioxide devices. Nanotechnology 2015, 26(16): 165202. [DOI] [PubMed] [Google Scholar]
- 23.Del Valle J, Salev P, Kalcheim Y, Schuller IK. A caloritronics-based Mott neuristor. Sci Rep 2020, 10(1): 4292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shi Y, Nguyen L, Oh S, Liu X, Koushan F, Jameson JR, Kuzum D. Neuroinspired unsupervised learning and pruning with subquantum CBRAM arrays. Nature Communications 2018, 9(1): 5312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Chen P-Y, Peng X, Yu S. NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2018, 37(12): 3067–3080. [Google Scholar]
- 26.Shrivakshan G, Chandrasekar C. A comparison of various edge detection techniques used in image processing. International Journal of Computer Science Issues (IJCSI) 2012, 9(5): 269. [Google Scholar]
- 27.Zhao W, Cao Y. Predictive technology model for nano-CMOS design exploration. ACM Journal on Emerging Technologies in Computing Systems (JETC) 2007, 3(1): 1. [Google Scholar]
- 28.Zhao W, Cao Y. New generation of predictive technology model for sub-45 nm early design exploration. IEEE Transactions on Electron Devices 2006, 53(11): 2816–2823. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the plots and other results of this paper are available from the corresponding author upon request.
