# Supplementary Information

## 4K-Memristor Analog-Grade Passive Crossbar Circuit



H. Kim, M. R. Mahmoodi, H. Nili, and D. B. Strukov

Supplementary Figure 1. Half-select disturbance. A typical half-biasing scheme in (a) passive ("0T1R") and (b) active ("1T1R") crossbar circuits that are employed for applying write voltages to the selected memristor. In 0T1R arrays, a fraction of the external voltage, which is applied to the crossbar array to write the selected device, is dropped across the halfselected devices that share the same horizontal and vertical electrodes with the selected device. On the other hand, the select transistors and additional lines controlling it allow applying a nonzero voltage across the selected device only in the 1T1R circuits. (c) Schematic drawings of representative  $I-V$  curve and device-to-device distributions for set and reset switching voltages (bottom right and top left insets, correspondingly) which highlight the issue of halfselect disturbance in passive crossbar circuits. Specifically, writing the selected device with the switching threshold at the higher end of the distribution can disturb the half-selected devices at the lower end of the distribution if the distribution is wide enough. Adapted from Ref. 1.



Supplementary Figure 2. Sidewall residue challenge. (a) A top and (b) cross-sectional scanning electron microscopy (SEM) images of the crossbar circuit fabricated without planarization steps. Panel a inset shows a zoomed-in image of a crosspoint area. When the top electrodes (TEs) are patterned with the etching process without the planarization step, a sidewall residue along the bottom electrodes (BEs) results in the shortening of all TEs.



Supplementary Figure 3. Chemical-mechanical polishing calibration. Chemicalmechanical polishing with 80 rpm plate rotate rate, 50 ml/min slurry flow rate under two different conditions for back-pressure: (a) 25 psi and (b) 35 psi. The latter conditions result in higher quality surface and are utilized in device fabrication.



Supplementary Figure 4. As-fabricated crossbar results. (a) The conductance map measured at 0.4 V. Median conductance is  $\sim$  45 nS. (b) *I-V* characteristics for the 36 virginstate (i.e., before forming) devices of the 6×6 subarray located in the center of the crossbar circuit. The blue curve corresponds to the device highlighted in Fig. 2a of the main text.



Supplementary Figure 5. Additional crossbar circuit characterization data. (a) Switching threshold voltage map. Blue and red data points correspond to set and reset voltages, respectively. (b) Correlations in switching voltages. The data are post-processed from Fig. 2e of the main text. (c) A retention test is performed after the 1M-cycle endurance test shown in Fig. 2d of the main text. Conductance is measured at 0.1 V at 2 s intervals while continuously baking the crossbar circuit at 100°C.



Supplementary Figure 6. Additional data for the classifier experiment. (a) Software-based training results for single-layer perceptron classifier. (b) The cumulative distribution of absolute pre-activation error, i.e.,  $|I_{\text{ideal}} - I_{\text{measured}}|/(I_{\text{ideal}})_{\text{max}}$ , for several studied tuning precisions. The data are computed based on experimentally measured currents and their desired values computed in software for  $10<sup>4</sup>$  test patterns. (c) The comparison of measured and ideal preactivation distributions for the case of  $1\%$  relative tuning precision for  $10<sup>4</sup>$  test patterns.



Supplementary Figure 7. Modeling half-select disturbance. (a-d) Details of the utilized measurement protocol for modeling (a) set and (b) reset transitions and (c, d) results for conductance changes for three studied cases of initial conductance  $G_0$ . Each line with connected dots corresponds to the evolution of the conductance change, normalized to the specific tuned initial value  $G_0$  and averaged across 500 devices, upon application of the voltage pulses with a specific amplitude and exponentially increased duration. (e-i) A phenomenological model for dynamic behavior. The results of fitting dynamic equations to the experimental (e) set and (f) reset data, averaged over 500 devices, and (g) the corresponding model parameters. (h) The distribution of parameter  $\alpha$  fitted to reproduce experimentally observed device-to-device variations in Fig. 2f, and (i) predicted by the model variations in the switching threshold for 4096 devices in the modeled 64×64 crossbar circuit. See Supplementary Note 1 for more details. All conductances are specified at 0.1 V.



Supplementary Figure 8. Modeling MLP classifier. (a) General scheme for the modeled differential MLP network. (b) The evolution of classification accuracy and cross-entropy loss (inset) during ex-situ training.  $(c, d)$  Histograms of  $(c)$  the ideal conductances in the 1<sup>st</sup> positive  $(G<sup>1+</sup>)$ , 1<sup>st</sup> negative  $(G<sup>1-</sup>)$ ,  $2<sup>nd</sup>$  positive  $(G<sup>2+</sup>)$ , and  $2<sup>nd</sup>$  negative  $(G<sup>2-</sup>)$  weight layers, and (d) the ideal pre-activation currents for the hidden (L1) and the output (L2) neurons. (e, f) Map of ideal conductances for (e) the  $1<sup>st</sup>$  layer and (f) the  $2<sup>nd</sup>$  layer of the network.



Supplementary Figure 9. Device uniformity impact on MLP accuracy. (a-l) Modeling results when using (a-c) baseline, (d-f)  $1^{st}$ , (g-h)  $2^{nd}$ , and (j-l)  $3^{rd}$  tuning approaches. (a, d, g, j) The improvements in classification accuracy with more rounds of tuning.  $(b, e, i, k)$  Cumulative distribution of the absolute tuning error at the end of the  $10<sup>th</sup>$  tuning round. (c, f, h, l) Classification accuracy as a function of device variations at the end of the  $10<sup>th</sup>$  tuning round. The box plot shows the statistics over 10 different cases of initial conductances. The thick red lines correspond to the demonstrated technology, i.e.,  $\alpha = \sim 26\%$ . For simplicity, memristors' static I-V nonlinearities and noise are neglected, and ideal peripheral circuits are assumed in simulations. In panel a, the accuracy saturates after a few rounds because of the significant half-select disturbance when re-tuning higher switching threshold devices. In panel d, the utilized maximum values for set / reset thresholds are NA, 2, 0, 1.65, 1.45, 1.4, 1.35, 1.3, 1.2, 1.1 / NA, 0, -1.65, -1.45, -1.35, -1.3, -1.2, -1.1 for tuning rounds #1, #2, …, #10, respectively. In panel j, the highest accuracy is 97.29%.



Supplementary Figure 10. Experimental setup. The photo shows the setup's main parts, namely the packaged crossbar circuit mounted on a custom printed circuit board, personal computer controller, Agilent switch matrix, and Agilent B1500 semiconductor device analyzer.



Supplementary Figure 11. Crossbar circuit area scaling with device current. 64×64 crossbar circuit and its peripheral analog muxes areas for two topologies implemented in 65 nm process as a function of leakage and forming/write currents. (For simplicity, the area of sensing circuitry, decoders, and level-shifters are excluded in this figure to avoid performing more comprehensive modeling of scaling down  $G_{on}$  and finding optimal design of other peripheral components.) The cell area for  $0$ T1R technology is  $250 \times 250$  nm<sup>2</sup>, which was determined from the layout in the considered CMOS process. Thick-oxide 3.3 V transistors are used to implement select transistors and forming/switching current passing muxes in peripheral circuitry. Their sizing is chosen such as not to exceed the 0.7 V voltage drop and hence limit the maximum input voltage to 4 V. For 0T1R crossbar circuits, an analog switch are designed to pass the forming current to the selected device and leakages through other off-state devices in the crossbar. The leakage currents are modeled assuming negligible line resistance and floating forming configuration [29]. For 1T1R crossbar circuits, an analog switch in the periphery is designed to pass only the forming current though there is less headroom because part of the voltage is also dropped on the selector. The memory and peripheral circuits are assumed to overlap for 0T1R implementation. Note that forming current of 250  $\mu$ A and  $G_{\text{off}}$  = 1.056 µS correspond to the memory technology assumptions of the last two columns in Table S3, e.g. max[A, 128B]  $\approx 820 \text{ }\mu\text{m}^2$  and A+128B  $\approx 7400 \text{ }\mu\text{m}^2$  for 0T1R and 1T1R, correspondingly, where A is an area of the  $64\times64$  crossbar memory array and B is an area of analog mux circuitry serving one line. The simulation results show that for the 1T1R case, the total area is dominated by the cells' select transistor, which is scaled down with lowering forming currents. For the 0T1R case, the total area is due to peripheral muxes, which is reduced at higher  $G<sub>off</sub>$  when lowering leakage currents, but is mostly limited by forming currents at lower  $G_{\text{off}}$ .

Supplementary Table 1. Comparison of memristive circuits. The specific focus of the table is on the state-of-the-art non-volatile (filamentary) analog-grade 0T1R metal-oxide devices, while only few representative works are listed for metal-oxide 1T1R and solid-state-electrolyte 0T1R circuits. Furthermore, the table does not include recent results based on dense commercial "binary" 1T1R technology. Also, note that the common concern for the solid-state electrolyte type devices (rows #1 to #3) and interfacial switching  $WO_x$  devices (rows #4 to #6) is poor state retention.



<sup>0</sup> "Crossbar size" refers to the largest-dimension fabricated integrated crossbar circuit (not necessarily fully-functional), while the "largest working demo" refers to the largest number of devices employed at once in the demo, i.e., without relying on postprocessing / combining the results from separate measurements. <sup>1</sup> Based on the full pitch of the integrated memory cells. <sup>2</sup> Largest set voltages are used if statistical data are not reported. <sup>3</sup> The test conditions may be different. <sup>4</sup> Specified at 0.1V for the devices with nonlinear static I-V characteristics unless noted otherwise.<sup>5</sup> SA = Stand-alone integrated crossbar circuit, RIE = reactive ion etching, BEOL = Back end of line integrated crossbar circuit on CMOS wafer containing access transistors, FI-BEOL = BEOL with fully integrated CMOS peripheral circuits. <sup>6</sup> Denser single devices are reported, though most experimental results are for 25  $\mu$ m<sup>2</sup> devices. <sup>7</sup> Data for the low-resistance state. Significant retention loss at high resistance levels. <sup>8</sup> Based on Fig. 4c.  $9$  From Fig. 1d.  $10\frac{40\times40}{200}$  conductance map is based on combining results from separate 25 measurements of  $8\times8$ subarrays. <sup>11</sup> Based on Fig. 4d. <sup>12</sup> Based on Fig. S3 of [5]. Not clear if the data are obtained after tuning all devices or measured immediately after programming each device. <sup>13</sup> Average range of conductance values observed in the crossbar. There is a significant variation between different devices. <sup>14</sup> The effective retention drops with an increase in the utilized conductance range and/or precision of operation - see, e.g., Fig. 6.5 from [30]. <sup>15</sup> 126 6×8 physical subarrays utilized for a logical 108×54 array with the conductances measured after programming each subarray. <sup>16</sup> From Supplementary Note 10. <sup>17</sup> For the top crossbar, while it is 2 V / 50 µA for the bottom one. <sup>18</sup> Effective crossbar dimensions based on 3D-CMOL-like structure (with overlapping electrodes in one direction). <sup>19</sup> Total number of employed devices in one filter based on Fig. 4d. <sup>20</sup> Based on Fig. 2g. Though SEM images of 300-nm-scale devices are shown, all experimental results are based on microscale devices.<sup>21</sup> Based on Fig.1c of [17]. <sup>22</sup> Based on Fig. 1c of [16, 19]. <sup>23</sup> Based on Fig. S3 of [28].





<sup>1</sup> The total parasitic capacitance of electrodes in 0T1R arrays consists of line-to-line capacitance in crossbar structure (M5/M4/M3) that includes coupling and fringing capacitors between conductors (see footnote 3 in Supplimentary Table 3 for more details).

Supplementary Table 3. Memory assumptions and VMM modeling. Memory cell assumptions and the detailed breakdown of area and power for 64×64 VMM blocks.





<sup>1</sup>For the demonstrated 0T1R crossbar, the electrode width is 250 nm, and the gap size is 500 nm. In the case of scaled 0T1R technology,  $100\times100$  nm<sup>2</sup> cell footprint and 150 nm spacing between metal lines are based on design rules for the considered 65 nm process. The assumptions of 1T1R memory are somewhat aggressive when compared to the state-of-the-art demonstrations (Supplementary Table 1). For example, in analog-grade 1T1R device technology [16-19], the cell size is 2,500  $\mu$ m<sup>2</sup>, while switching currents and midrange conductance are ~10× higher. ~100  $F^2$  cell area in  $F = 22$  nm FinFET technology, which is equivalent to ~0.42  $\mu$ m<sup>2</sup> in 65 nm planar CMOS process, and 50  $\mu$ S midrange conductance were reported in [24]. <sup>2</sup>Device conductance is assumed to scale linearly with the device footprint [13].

<sup>3</sup>  $C_{\text{0T1R}} = (C_{\text{A\_B0T}} w + 2C_{\text{F\_B0T}} + C_{\text{A\_TOP}} w + 2C_{\text{F\_TOP}} + 2C_{\text{F}\text{ring}})(w + g)$  and  $C_{\text{1T1R}} = C_{\text{0T1R}} + C_{\text{diff}}$ , where w is the width of the electrode, g is the gap size between electrodes, and  $C_{diff}$  is the diffusion capacitance of the selector in the ohmic regime, and other parameters are obtained from the process design kit, i.e.  $C_{A,BOT} = 0.24$  fF/ $\mu$ m<sup>2</sup>,  $C_{A\_TOP} = 0.24$  fF/ $\mu$ m<sup>2</sup>,  $C_{F\_BOT} =$  $1.07 \times 10^{-2}$  fF/ $\mu$ m,  $C_{F\_TOP} = 1.1 \times 10^{-2}$  fF/ $\mu$ m,  $C_{Fring} = 6.5 \times 10^{-2}$  fF/ $\mu$ m.

<sup>4</sup>The maximum input voltage drop during forming of the device is 4 V, from which 3.3 V is dropped on a memristor and 0.7 V on analog programming muxes. All programming switching is designed using thick-oxide MOSFETs.

<sup>5</sup>The maximum input and output currents were found to be  $\leq 10I_{\text{max,cell}}$  from the detailed kernel mapping to 64×64 VMM blocks for representative neural networks [26].

 $6$  Level shifters are used to translate the output voltage from 1.2 V decoders to 3.3 V programming switches.

 $7$  Local sensing circuitry is optimized according to the VMM block parasitics [25,26].

<sup>8</sup>The sizing of analog switches is obtained according to the caption of Supplementary Figure 11.

<sup>9</sup> The total area for analog VMM block is calculated as max[A,  $128(B+C) + 64D + 2E$ ] and A +  $128(B+C) + 64D + 2E$  for 0T1R and 1T1R implementations, correspondingly. The max is due to the assumption of overlap between peripheral circuits and memory.

<sup>10</sup> The total area for mixed-signal VMM is calculated as  $max[A, 128(B+C) + 64D + 2E + 32F]$  and  $A + 128(B+C) + 64D + 2E$ + 32F for 0T1R and 1T1R implementations, correspondingly. The factor of 32 is due to using a 4-bit differential DAC circuit.

<sup>11</sup> Figure 5a explains the distributed local/global sensing implementation.

<sup>12</sup> The estimates are for one output channel so that the total for the 64-output VMM circuit is 64 times larger.



### Supplementary Table 4. Performance estimates for the two studied applications.



<sup>1</sup>Fully-analog 7-layer (1024-16384-4096-4096-1024-256-100) MLP circuit consists of ~105M weights and utilizes architecture similar to the one described in Fig. 5a. 4-bit buffered current steering DACs are assumed in the front-end of the network. The neurons in the last layer are assumed to be loaded with a 1 pF capacitor.

2 Registers are required for buffering input data in the MLP circuit.

<sup>3</sup> Memory efficiency is reported as a fraction of the area occupied by memory cells. In the case of 0T1R circuits, memory cell arrays are overlapped with peripheral circuits.

<sup>4</sup>We assume 4-bit aCortex architecture [26] utilizing mixed-signal 64×64 VMM blocks with 4-bit buffered current steering DACs. The performance is evaluated for Google's deep recurrent network for language translation (GNMT) benchmark with  $\sim$ 134 M weights.

#### Supplementary Note 1: Phenomenological Dynamic Model

The main purpose of the model is to estimate the change in device conductance  $\Delta G$ , with respect to the initial conductance  $G_0$ , all measured at small non-disturbing (read) voltage 0.1 V, upon application of write voltage pulse with amplitude  $V$  and a fixed duration of 2 ms. The fixed duration is assumed for simplicity, i.e., to avoid explicit dependence of conductance change on pulse duration in the model. This simplification is also justified because of a similar fixed-duration pulse approach utilized in the tuning algorithms. (In a more advanced algorithm, variable time duration could be used for faster convergence [27]). Because of the long memory state retention for the developed metal-oxide memristors, i.e., their strongly nonlinear switching kinetics, obtaining meaningful experimental data for fitting conductance changes at half of the nominal write voltages required applying very long, with up to 2 ms duration pulses (Supplementary Figure 7a-d). This is the main difference compared to the phenomenological model presented in Ref. 28, which used experimental data for a narrower range of write voltage pulse amplitudes and durations to derive dynamic model, and hence somewhat inaccurate in predicting conductance changes at smaller, half-bias voltages. The following function is found to fit well experimental data for both set and reset switching **tary Note 1: Phenomenological Dynamic Model**<br>
main purpose of the model is to estimate the change in device conductance<br>
to the initial conductance  $G_0$ , all measured at small non-disturbing (read) vol<br>
appliciation of mic Model<br>ate the change in device conductance  $\Delta G$ ,<br>ured at small non-disturbing (read) voltage<br>a amplitude V and a fixed duration of 2 ms.<br>avoid explicit dependence of conductance<br>action is also justified because of a

$$
\frac{\Delta G}{G_0} \approx \exp\left[\frac{\beta_1}{1+\beta_2(\alpha V)^2}\right] \sinh\left[\beta_3 \frac{\alpha V}{1+\beta_2(\alpha V)^2}\right] \left(\gamma_1 + \gamma_2 \sqrt{G_0} + \gamma_3 G_0\right),\tag{S1}
$$

where  $\beta_1$ ,  $\beta_2$ ,  $\beta_3$ ,  $\gamma_1$ ,  $\gamma_2$  and  $\gamma_3$  are fitting parameters common for all devices (Supplementary Figure 7g), while  $\alpha$  is a unique scaling parameter for each device that represents device to device variations in the switching threshold (Supplementary Figure 7h).

Specifically, the model for the average behavior, with fixed  $\alpha = 1$ , is first found by fitting a surface to the experimental data for the average conductance changes, i.e.  $\{\leq \Delta G/G_0\}$ ,  $G_0$ ,  $V$  data points (Supplementary Figure 7e,f).

As a reminder, the effective set (reset) switching threshold of the crossbar array is defined as a voltage at which the small-voltage conductance is changed from its extreme value  $G_0 = 14 \mu S$  (75  $\mu$ S) by more than 20%, i.e.,  $|\Delta G|/G_0 = 0.2$  when applying increasing amplitude positive (negative) voltage ramp (Fig. 2e). According to the fitted model,  $V_{\text{set}}^* = 1$  V and  $V_{\text{reset}}^* =$ -1.4 V for  $\alpha = 1$ . The experimentally measured threshold voltages (Fig. 2f) are well approximated with log-normal distributions with parameters  $\mu = 0.14$  and  $\sigma = 0.25$ , and  $\mu =$ 0.29 and  $\sigma$  = 0.26 for set and reset switching, respectively. According to the selected fitting function, parameter  $\alpha$  is a multiplicative factor for the applied voltages. Hence, when modeling distribution in set threshold voltages of a crossbar circuit, we first randomly initialize  $V_{\text{set}}$  for each crosspoint device by sampling it from the fitted set threshold log-normal distribution and then find the corresponding  $\alpha_{\text{set}} = V_{\text{set}}^* / V_{\text{set}}$ . A similar approach is used to initialize  $\alpha_{\text{reset}}$ . An example of the generated  $\alpha$  using such approach and corresponding threshold voltages predicted by the model are shown in Supplementary Figure 7h and 7i, respectively.

Finally, since the experimentally observed variations in set and reset threshold voltages (i.e. the relative standard deviations or the coefficient of variations) are very similar, for simplicity, we use the same  $\alpha$  when sweeping variations in the modeling studies (Fig. 5 and Supplementary Figure 9).

### Supplementary Note 2: System-Level Performance Estimates

To demonstrate the prospects of 0T1R technology, we model the performance of two representative neuromorphic architectures - aCortex [26], which is an energy-efficiencyoptimized multi-purpose architecture for the acceleration of a wide range of neural network inference models, and a fully-analog large-scale (1024-16384-4096-4096-1024-256-100) multilayer perceptron with ~105M parameters, which is especially suitable for high-throughput inference tasks. All peripheral circuits, digital blocks, and circuits for conductance tuning are designed in the 65 nm CMOS process – see Supplementary Table 2-4 for more details. The performance is evaluated using the results of physical layout and SPICE simulations of the major components. All designs involve 64×64 physical crossbar circuits, while differential implementation based on two physical crossbar circuits, i.e., similar to the architecture shown in Fig. 5a, are assumed for 64×64 VMM operation. The details of the simulation methodology for aCortex were presented in [26]. For MLP, the complete signal path from the network's input to the output is properly modeled by simulating signal propagation in VMM blocks and taking into account the intra-block parasitic capacitance of the global lines. Furthermore, three technology options are evaluated – the one with parameters close to the demonstrate device, 65-nm 0T1R, and 65-nm 1T1R devices with the parameters shown in Supplementary Table 2b.

The simulation results for the fully-analog MLP implementation with the demonstrated technology show 5.61  $\mu$ J/f energy-efficiency and 16.6 Mf/s throughput with 12.95% of 4.07  $\rm cm^2$  of the chip occupied by the memristors. When scaled down to 65 nm, though throughput reduces due to 11.6 Mf/s, the overall energy efficiency improves because of power scaling in the array and amplifiers. As expected, the area slightly improves to  $3.17 \text{ cm}^2$  for 65-nm 0T1R technology, though it becomes substantially larger (increased to 6.39 cm<sup>2</sup>) for 1T1R technology.

The simulation results for aCortex show that the inference time for GNMT benchmark tasks slightly improves when scaling 0T1R technology. Though peripheral circuits become slower in scaled 0T1R circuits due to the reduced midrange device conductance, the upshot is a more compact implementation of VMM blocks, which results in less parasitics in the digital circuits and much faster data transfer. Most importantly, the density of the scaled 0T1R aCortex chip is better by a factor of  $\sim 18\times$ , while throughput and energy efficiency also substantially higher compared to the 65-nm 1T1R design.

 Our estimates show that further scaling down of the technology will increase the gap between passive and active memories even more if the switching voltage and currents remain the same. Alternatively, with appropriate scaling of cell currents, more efficient and compact peripheral circuits can be utilized to improve memory efficiency, especially in the MLP circuit, potentially matching that of embedded NOR flash memory aCortex [26]. Furthermore, the memory efficiency is expected to improve significantly with periphery sharing and 3D memory integration.

## References

- [1] Strukov, D. B. Tightening grip. Nat. Mater. 17, 293-295 (2018).
- [2] Yeon, H. et al. Alloying conducting channels for reliable neuromorphic computing. Nat. Nanotechnol. 15, 574-579 (2020).
- [3] Kim, K. et al. A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Lett. 12, 389-395 (2011).
- [4] Shin, J. et al. Hardware acceleration of simulated annealing of spin glass by RRAM crossbar array. in IEEE International Electron Device Meeting (IEDM), 18.63-18.64. (IEEE, 2018).
- [5] Sheridan, P. M. et al. Sparse coding with memristor networks. Nat. Nanotechnol. 12, 784 (2017).
- [6] Moon, J. et al. Temporal data classification and forecasting using a memristor-based reservoir computing system. Nat. Electron. 2, 480-487 (2019).
- [7] Ma, W. et al. Device nonideality effects on image reconstruction using memristor arrays. in IEEE International Electron Device Meeting (IEDM), 16.7.1-16.7.4. (IEEE, 2018).
- [8] Cai, F. et al. A fully integrated reprogrammable memristor–CMOS system for efficient multiplyaccumulate operations. Nat. Electron. 2, 290-299 (2019).
- [9] Choi, S. et al. Experimental demonstration of feature extraction and dimensionality reduction using memristor networks. Nano Lett. 17, 3113-3118 (2017).
- [10] Jeong Y. et al. K-means data clustering with memristor networks. Nano Lett. 18, 4447-4453 (2018).
- [11] Prezioso, M. et al. Modelling and implementation of firing-rate neuromorphic-network classifiers with bilayer Pt/Al<sub>2</sub>O<sub>3</sub>/TiO<sub>2</sub>- <sub>x</sub>/Pt memristors. in Proc. IEEE International Electron Device Meeting (IEDM), 17.4.1-17.4.4. (IEEE, 2015)
- [12] Prezioso, M. et al. Training and operation of an integrated neuromorphic network based on metaloxide memristors. Nature 521, 61 (2015).
- [13] Merrikh Bayat, F. et al., Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits. Nat. Commun. 9, 2331 (2018).
- [14] Adam, G. et al. 3-D memristor crossbars for analog and neuromorphic computing applications. IEEE Trans. Electron Devices 64, 312-318 (2016).
- [15] Lin, P. et al. Three-dimensional memristor circuits as complex neural networks. Nat. Electron. (2020).
- [16] Li, C. et al. Efficient and self-adaptive in-situ learning in multilayer memristor neural networks Nat. Commun. 9, 2385 (2018).
- [17] Hu, M. et al. Memristor-based analog computation and neural network classification with a dot product engine. Adv. Mat. 30, 1705914 (2018).
- [18] Li, C. et al. Long short-term memory networks in memristor crossbar arrays. Nat. Mach. Intell. 1, 49 (2019).
- [19] Wang, Z. et al. Reinforcement learning with analogue memristor arrays. Nat. Electron. 2, 115 (2019).
- [20] Yao, P. et al. Face classification using electronic synapses. Nat. Commun. 8, 15199 (2017).
- [21] Yao, P. et al. Fully hardware-implemented memristor convolutional neural network. Nature 577, 641-646 (2020).
- [22] Liu, Q. et al. A fully integrated analog ReRAM based 78.4 TOps/W compute-in-memory chip with fully parallel MAC computing. in Proc. IEEE International Solid-State Circuits Conference (ISSCC), 500-501. (IEEE, 2020).
- [23] Zheng, X. et al. Error-resilient analog image storage and compression with analog-valued RRAM arrays: An adaptive joint source-channel coding approach. in Proc. IEEE International Electron Device Meeting (IEDM), 3.5.1-3.5.4. (IEEE, 2018).
- [24] Golonzka, O. et al. Non-volatile RRAM embedded into 22FFL FinFET technology. in Proc. Very Large Scale Integration Symposium (VLSISymp), T230-231. (IEEE, 2019).
- [25] Mahmoodi, M.R & Strukov, D. An ultra-low energy internally analog, externally digital vectormatrix multiplier based on NOR flash memory technology. in Proc. Design Automation Conference (DAC), 1-6. (ACM/ESDA/IEEE, 2018).
- [26] Bavandpour, M., Mahmoodi, M.R., & Strukov, D.B. aCortex: An energy-efficient multi-purpose mixed-signal inference accelerator. IEEE J. Explor. Solid-State Computat. 6, 98–106 (2020).
- [27] Merrikh Bayat, F. et al. Model-based high-precision tuning of NOR flash memory cells for analog computing applications. in Proc. Device Research Conference (DRC), 1-2. (2017).
- [28] Nili, H. et al. Comprehensive compact phenomenological modeling of integrated metal-oxide memristors. IEEE Trans. Nanotechnol. 19, 344-349 (2020).
- [29] Prezioso, M., Merrikh-Bayat, F., Chakrabarti, B. & Strukov, D.B. RRAM-based hardware implementations of artificial neural networks: Progress update and challenges ahead. in Proc. SPIE'16 Photonics West, art. 974918 (SPIE, 2016).
- [30] Jo, S. H. Nanoscale Memristive Devices for Memory and Logic Applications. University of Michigan, Ph.D. Dissertation, 2010.