Skip to main content
Micromachines logoLink to Micromachines
. 2021 May 12;12(5):551. doi: 10.3390/mi12050551

Investigation of PVT-Aware STT-MRAM Sensing Circuits for Low-VDD Scenario

Zhongjian Bian 1,, Xiaofeng Hong 1,, Yanan Guo 1, Lirida Naviner 2, Wei Ge 1, Hao Cai 1,2,*
Editor: Peng Huang
PMCID: PMC8151166  PMID: 34066185

Abstract

Spintronic based embedded magnetic random access memory (eMRAM) is becoming a foundry validated solution for the next-generation nonvolatile memory applications. The hybrid complementary metal-oxide-semiconductor (CMOS)/magnetic tunnel junction (MTJ) integration has been selected as a proper candidate for energy harvesting, area-constraint and energy-efficiency Internet of Things (IoT) systems-on-chips. Multi-VDD (low supply voltage) techniques were adopted to minimize energy dissipation in MRAM, at the cost of reduced writing/sensing speed and margin. Meanwhile, yield can be severely affected due to variations in process parameters. In this work, we conduct a thorough analysis of MRAM sensing margin and yield. We propose a current-mode sensing amplifier (CSA) named 1D high-sensing 1D margin, high 1D speed and 1D stability (HMSS-SA) with reconfigured reference path and pre-charge transistor. Process-voltage-temperature (PVT) aware analysis is performed based on an MTJ compact model and an industrial 28 nm CMOS technology, explicitly considering low-voltage (0.7 V), low tunneling magnetoresistance (TMR) (50%) and high temperature (85 °C) scenario as the worst sensing case. A case study takes a brief look at sensing circuits, which is applied to in-memory bit-wise computing. Simulation results indicate that the proposed high-sensing margin, high speed and stability sensing-sensing amplifier (HMSS-SA) achieves remarkable performance up to 2.5 GHz sensing frequency. At 0.65 V supply voltage, it can achieve 1 GHz operation frequency with only 0.3% failure rate.

Keywords: MRAM, sensing margin, yield, low TMR sensing, PVT variation, logic-in-memory

1. Introduction

Perpendicular anisotropy-based magnetic tunnel junctions (p-MTJs) have been extensively studied to develop spin-transfer torque magnetic random access memories (STT-MRAMs) [1,2,3]. Hybrid MTJ/CMOS integration is developed with device scaling down to feature small dimension and low-power operations. STT-MRAM has been regarded as a potential candidate in the next-generation nonvolatile memories [4,5,6,7]. Compared to resistive random access memory (RRAM) and phase-change random access memory (PRAM), MRAM has a low read margin, but is suitable for high density integration and has high endurance, so the benefits can be greater if all design challenges are addressed to meet the design targets and achieve cost efficiency [8]. MTJ also shows CMOS compatibility thanks to the integration with Back End of Line (BEOL) process. Above merits enable MRAM replacement of SRAM/flash memory, especially for embedded systems and their applications.

The sensing amplifier (SA) or sensing circuit is an indispensable building block in spintronics-based circuits [4,5,6,7,9,10,11]. The latest sensing amplifier circuits for MRAM are detailed in [8]. low-VDD scenario, major design concerns of SA include sensing speed, margin and yield performance [12]. In general, MRAM sensing performance is dependent on hybrid process, voltage and temperature of surrounding environment, as well as aging degradation, which may suffer from read-disturbance and read-decision failure issues [6,7,9,10,11,13,14]. Redundancy and error-correcting code (ECC) techniques are normally used to solve the above reliability issues, which may deteriorate power-performance-area (PPA) metrics. Besides, scaling down of MTJ/CMOS device dimension gives rise to MRAM design challenges, mainly related to the insufficient sensing margin and increased sensing error rate.

Two SA modes were reported in previous work. Voltage-mode SA (VSA) is employed in previous small-Icell memory (non-volatile memories (NVMs) and low-voltage static random access memories (SRAMs) designs [12,15,16]. Although VSA benefits long BL developing time with bit-line (BL) and SA offset tolerance, long sensing latency becomes a critical issue, whereas current-mode SA (CSA) achieves improved sensing speed than VSA with reduced cell current [12,17,18,19]. And offset canceling technology for CSA is introduced in detail in [20,21,22].

Based on an industrial 28-nm CMOS process and MTJ compact model, this study investigates the performance of six CSAs, considering wide sensing supply voltage (VDD) and temperature range, as well as process fluctuation. We propose a low-voltage sensing circuits to enable low-power scenario operation of MRAM. Our contribution are summarized as follows:

  • Six typical current-mode SAs are process-voltage-temperature (PVT) studied at 28-nm CMOS node. The analysis is based on a unitary transistor sizing rule to enable a fair comparison of sensing circuits.

  • We propose a novel current-mode SA named 1D high-sensing 1D margin, high 1D speed and 1D stability (HMSS) SA, with dual reference configuration to enlarge sensing margin, the modified pre-charged pMOSFET to improve the sensing uniformity of logic ‘0’ and ‘1’.

  • For low-VDD scenario, MRAM sensing variability, yield and failure should be emphasized, with the design trade-off of energy consumption and layout area.

  • A modified spintronics-based logic-in-memory (LIM) scheme is proposed. The proposed HMSS-SA configures the high-sensing margin in-memory bit-wise computing with reduced failure probability.

The remainder of this paper is organized as follows. Previous current-mode SAs and our proposed HMSS-SA are discussed in Section 2. Section 3 performs the simulation and analyzes the sensing power-delay trade-off, process-temperature variations, sensing failure issues and low-Vdd design boundary. In Section 4, a modified logic-in-memory scheme is implemented using HMSS-SA for in-memory bit-wise computing, and we provide conclusion in Section 5.

2. Preliminary

2.1. STT-MRAM Bit-Cell

The p-MTJs with MgO/CoFeB/heavy metal (e.g., Ta, Hf) structures bring a reasonable magnetoresistance ratio (TMR). Using a double MgO/CoFeB interface free layer and a single interface, the p-MTJs also possess a considerable thermal stability factor (Δ) and high switching current density [1,2]. A sufficient write current (Ic0) is required for changing between the parallel (P) and antiparallel (AP) MTJ states.

Typical flash-like MRAM bit-cell is configured with one MTJ connected in series with one access transistor as the 1T-1M structure. MTJ free layer is connected to the bit-line (BL) of memory array. Important building blocks eg., bit-cells, reference generators and sense amplifiers constitute MRAM sensing circuit. The bit-cell resistance along the BL is determined by P or AP state of MTJ. The sensing current is compared with its reference value to decide the logic ‘1’ or ‘0’. Table 1 lists the physical parameters of STT-MTJ used in this work. Several reliability issues impact MRAM bit-cell performance. The magnetic thermal noise demonstrates as an additional three-dimension magnetic field [23,24]. Besides, fabrication variability of MTJ diameter, thickness of each layer (MgO, free and fixed) and thermal stability cause performance uncertainties in bit-cell [25].

Table 1.

Physical properties and design parameters of spin-transfer torque-magnetic tunnel junction (STT-MTJ) and magnetic random access memory (MRAM) bit-cell [1,26].

Parameter Description STT-MTJ
TMR tunneling magnetoresistance
(TMR) ratio
50–200%
Tox magnetic tunnel junction
(MTJ) oxide thickness
0.85 nm
Tsl MTJ free layer thickness 1.3 nm
Area MTJ layout surface 40 nm × 40 nm
RP/RAP MTJ resistance 6.2 kΩ/18.6 kΩ
α Damping factor 0.027
Δ Thermal stability 72
Ic0 Write critical current 50 µA
T Ambient temperature 243 K, 300 K, 358 K
Vddsense Sensing supply voltage 0.6–1 V
W/L Transistor dimension W = 200 nm, L = 30 nm

2.2. Sensing Circuits for STT-MRAM

Following current-mode sensing circuits for STT-MRAM are investigated in this work, the circuit schematics are demonstrated in Figure 1.

Figure 1.

Figure 1

The schematics of sensing circuits. (a) Pre-charge SA (PCSA) [27]. (b) Double switches and transmission gate access transistor (DSTA) SA [28]. (c) Latch offset cancellation (LOC) SA [29]. (d) Dynamic dual-reference (DDRS) SA [30]. (e) Offset-compensated high-speed (OCHS) SA [31]. (f) High-sensing margin, high speed and stability (HMSS)SA.

  • Pre-charge sensing amplifier (PCSA) [27];

  • Offset-compensated high-speed sensing (OCHS) [31];

  • Dynamic dual-reference sensing (DDRS) [30];

  • Latch offset cancellation sensing (LOC) [29];

  • Double switches and transmission gate access transistor sensing (DSTA) [28];

  • High-sensing margin, high speed and stability sensing (HMSS, proposed in this work).

The signals in the circuits listed are described below, “RE” is the enabling signal of the SA; “Vclamp” is the clamp voltage; “V(sel)” is column select signal; “V(wl)” is word line select signal and "PRE" is precharge signal. For other signals, please refer to the cited paper.

Table 2 lists the qualitative comparison of different sensing circuits, including the number of sensing path, the number of reference, P-channel metal–oxide–semiconductor (pMOS) load type and the reference scheme (CM for current-mean, RM for resistance-mean). Only transistor counts and minimum TMR are reported in the comparison, as different CMOS and MTJ process were used to realize previous sensing circuits [27,28,29,30,31,32]. Further quantitative analysis will be performed in Section 3. Low VDD (low sensing current) method is preferred to overcome unexpected spin inversion [33]. Although this method directly benefits low power consumption, the drawback is that sensing margin can be significantly limited, which causes sensing failure and yield degradation.

Table 2.

Qualitative comparison of different sensing circuits.

IEEE Transactions on Very Large Scale Integration (TVLSI)’12 TVLSI’14 TCAS-1’15 TCAS-1’17 TVLSI’18 This Work
Name/Abbreviation Pre-charge sensing amplifier (PCSA) Double switches and transmission gate access transistor sensing (DSTA) Latch offset cancellation (LOC) Dynamic dual-reference sensing (DDRS) Offset-compensated high-speed sensing (OCHS) High-sensing margin, high speed and stability sensing (HMSS)
Technology node (reported) * 65 nm 65 nm 45 nm 40 nm 65 nm 28 nm
Number of path 3 2 2 3 2 3
Number of reference single single single dual single dual
Reference scheme ** CM RM RM CM + RM RM CM + RM
Number of transistor 21 T 22 T 19 T 39 T 16 T 33 T
Suggest Vdd 1.1 V 1.1 V 1 V 1 V 0.75 V 0.7 V
Minimum TMR (reported) 150% N/A 150% 50% 100% 50%
Voltage of pMOS load fixed fixed Varied Varied Varied Varied
Width of pMOS load 3.3 µm N/A >1 µm 0.24 µm 0.6 µm 0.2 µm

* Previous sensing circuits will be further analyzed with 28-nm complementary metal-oxide-semiconductor (CMOS) technology in Section 3. ** The type of reference scheme: CM for current-mean scheme, RM for resistance-mean scheme.

Conventionally, source degeneration and one-paired balanced reference scheme were used to improve process variation tolerance during MRAM sensing operation [27]. In [31], an offset-compensated high-speed sense amplifier (OCHS-SA) was implemented for high speed and high yield with offset voltage cancellation (see Figure 1e). It generates a voltage difference between MTJx and MTJref path in pre-charge phase from M2 and M1 respectively. The next is the resistance change of M1 and M2 will amplify the voltage difference. Finally, the SE signal will open the latch to amplify the voltage difference to get an output voltage.

A dynamic dual-reference sensing (DDRS) scheme is proposed in [30]. DDRS can achieve a high sensing margin with the tradeoff such as slow speed, low yield and cannot solve the problem of offset voltage caused by PVT variation. The working principle of DDRS-SA is that a voltage difference will be generated between Data cell (if the data saved in Data cell is 0) and RH path. Then the voltage difference will be amplified through transistor P4 and P5. Finally, SA1 will amplify the voltage difference to generate the output. The sensing margin of OCHS-SA is lower than DDRS-SA and unbalance between read ‘0’ and read ‘1’.

In [29], the sensing circuit, latch sense amplifier and write driver are merged as a LOC-SA to reduce the voltage developing time, so that sensing latency can be significantly improved. The yield of LOC-SA is also enhanced through the offset cancellation scheme. In [28], double switch schemes with both foot-switch and head-switch have been used to overcome the invalid current problem (sensing dead zone). Last but not least, our proposed HMSS-SA demonstrates high sensing margin, fast speed, and high stability [32].

2.3. The Proposed HMSS Sensing Circuit

In order to achieve a high sensing margin in MRAM read operation, a novel sensing circuit implementation named HMSS-SA is proposed, as shown in Figure 1f.

  • Pre-charge phase: PRE is set to low, R1, R2, Vclamp, VSL, VWL are set to high. pMOS M2, M3 and M4 are turned on for the pre-charge of the path of MTJx, MTJ0 and MTJ1 respectively. The output voltage of out1 is higher, and the output voltage of out0 is lower. Since the content of MTJx is 1(0), the output voltage of outx is the same as that of out1 (out0).

  • Sensing phase: PRE is set to high, R1 and R2 are set to low, and the voltage obtained by pre-charge is sensed and amplified. Since outx and out1 (out0) have the same voltage, the voltage of outx and out1 (out0) will be changed from the change between pMOS M3 (M4) and M2 (M5).It reaches the state where the outx output voltage is high (low) and out1 and out0 output voltage are low (high).

  • Amplified phase: Vclamp, VSL, VWL are set to low, SE is set to high, and the voltage of outx is rapidly increased (decreased) to the standard high (low) voltage by the influence of double latch. out1 and out0 are decreased (increased) to the standard low (high) voltage.

Figure 2 illustrates the simulated waveform. The proposed HMSS-SA introduces MTJ1 and MTJ0 as double references to enlarge the sensing margin. The principle is that when the storage content of MTJx is ‘0’ (see Figure 1f), the primary reference object is MTJ1. When the storage content of MTJx is ‘1’, the primary reference object is MTJ0. Therefore regardless of the value stored in MTJx, the sensing margin of the circuit is always the voltage difference between the MTJ1 and MTJ0 paths. Since the circuit uses a double reference, the current in the MTJx path is approximately twice the traditional signal reference sense amplifier. Therefore, in order to match the current in the MTJx, a dual pMOS method is adopted in both the MTJ1 and the MTJ0 path. During pre-charging, pMOS M1015 are in the off state, and M79 are in the on state. At this time, the power supply pre-charges the MTJx, MTJ0 and MTJ1 paths through M24 respectively. Selecting M2 instead of M5 as the pre-charge pMOS for MTJx can greatly reduce the uniformity of reading ‘0’ and reading ‘1’. During the sensing phase, pMOS M1015 are in the on state, and M79 is in the off state, so that the voltage difference between outx and out1(0) can be amplified. When the amplifier phase is reached, the three paths of MTJx, MTJ0 and MTJ1 are turned off, and M10 and M15 are in the off state. At this time, the double SA further amplifies the voltage difference between outx and out1(0).

Figure 2.

Figure 2

Simulated waveform of proposed HMSS sensing circuit.

Table 2 summarizes and compares the recently published sensing circuits according their performance in [27,28,29,30,31,32]. In next sections, the above mentioned sensing circuits will be evaluated with 28-nm CMOS technology.

2.4. Logic-in-Mram Application

The combination of MRAM and logical computing is a highly energy efficient approach. Since stored data has been already memorized into MTJ devices in the proposed circuits, the supply voltage can be immediately cut off without data transmission into external nonvolatile storage devices when the circuit changes to a standby mode. This property achieves great reduction of power dissipation [34,35,36].

3. PVT-Aware Analysis of Low-VDD MRAM Sensing Operation

3.1. Sensing Margin Estimation

Sensing margin (SM) = |VREFVDATA| increases as |RREFRDATA| increases. In a dual reference SA, when the data is ‘1’ (‘0’), the actual reference is MTJ0 (MTJ1). Comparing with the average resistance SA, |RREFRDATA| is two-fold increased, so that the sensing margin can be greatly enlarged.

Assume that σ is the PVT induced maximum voltage deviation produced by the load transistor VTH changing to the output (σ is the absolute value of the maximum deviation). The SM without the variation of the load pMOS transistor VTH is referred to as the ideal SM value. Equations (1) and (2) describe the sensing margin and σ when reading ‘0’ and reading ‘1’.

When reading logic ‘0’, SMmin.(w/o) in Equation (1) describes the minimum value of SM without cross-precharge, such as PCSA and LOC, with a 2σ deviations from the nominal sensing margin. When designing with partially offset cancellation, such as OCHS and HMSS, SMmin.(the minimum value of SM) is with a deviation of σ from the nominal margin. Equation (2) explains the condition when reading logic ‘1’ w/o and with cross-precharge.

SMmin.(w/o)=|VREFσ(VDATA+σ)|=VREFVDATA2σSMmin.=|VREF(VDATA+σ)|=VREFVDATAσForMTJPstatesensing (1)
SMmin.(w/o)=|VREF+σ(VDATAσ)|=VDATAVREF2σSMmin.=|VREF(VDATAσ)|=VDATAVREFσForMTJAPstatesensing (2)

3.2. Energy-Delay Performance Evaluation

The analysis is executed with an experimental validated p-MTJ compact model to investigate the performance of sensing circuits [37,38]. 200 nm/30 nm width/length transistor dimension is used to design sensing circuits based on a sweep analysis for performance optimization. Regarding the process variations, the mean and standard deviation of parameters are estimated through Monte Carlo (MC) simulations. The sensing failure probability is analyzed under global process variation and local mismatch of 28-nm transistor and 40-nm-diameter STT-MTJ. The evaluations are performed in Cadence analog design environment with 1000 runs MC analysis. 1-sigma CMOS transistor variability is considered, whereas the Gaussian distribution is realized in STT-MTJ at the range 0.9 to 1.1.

Figure 3 and Figure 4 are the waveform depicting the transient behavior of each circuit. Sensing AP as an example, the operation can be divided into two phases, one is the sensing phase before clock rising edge, the other is the amplify phase after the clock rising edge. Since the sensing phase of the DDRS and the amplify phase need to read the voltage changes between Vdata and out1, the DDRS waveform is separately shown here from the other waveforms. The clamp transistor that uses the Vclamp as the gate voltage ensures that the voltage and current on the bit line within a certain range which will not change the state of the MTJs. Therefore, when the power supply voltage is reduced, Vclamp should be reconfigured so as to obtain a higher read yield without changing the state of the MTJ. According to sensing methods, the sensing latency with different Vdd can be obtained under the condition of adjusting the Vclamp.

Figure 3.

Figure 3

Sensing antiparallel (AP) as an example, simulated sensing operation waveform of OCHS, HMSS, LOC, DSTA and PCSA designed with 28-nm complementary metal-oxide-semiconductor (CMOS) with (a) nominal sensing Vdd = 1 V and (b) low sensing Vdd = 0.7 V. Pre-charge, sensing and amplified phases are sequentially demonstrated. The dotted line represents the sensing clock signal. DDRS waveform is separately demonstrated in Figure 4.

Figure 4.

Figure 4

Sensing AP as an example, the sensing operation waveform of DDRS (see Figure 1d). As both sensing and amplify phase need to obtain the voltage changes between Vdata and out1, the DDRS waveform is separately demonstrated with (a) sensing Vdd = 1 V and (b) sensing Vdd = 0.7 V.

Figure 3 and Figure 4 illustrate the sensing operation waveform, including pre-charge, sensing and amplified phases. The latency from beginning to stable of each phase is evaluated and accumulated, with the delay of the amplifier stage as the the total sensing latency.

Figure 5 compares sensing latency performance. LOC is with the worst sensing latency due to the multi-phases (equalizing, voltage developing, comparison and latching) sensing mechanism. In general, the delay of DDRS at 0.7 V is the largest, but at 0.8 V to 1 V the delay of LOC is the largest. At the same time, as the voltage increases, the delay of DSTA falls slower than others. Thus, DSTA latency is slightly higher than other SAs except LOC in the range from 0.8 V to 1 V. PCSA, OCHS and the proposed HMSS maintain an enhanced sensing speed over the 0.7 V to 1 V voltage range. The reason why the delay of DDRS increases so much at low voltage is that the voltage difference (obtained through reading circuit) is the gate voltage of the N-Metal-Oxide-Semiconductor (NMOS) pair to control the amplifier. The discharge current is immediately decreased when ultra-low VG is biased. The latency is simultaneously increased.

Figure 5.

Figure 5

The comparison of sensing latency with (a) reading 0 and (b) reading 1.

The dynamic power consumption is evaluated through averaging the power dissipation in several operating phases. Figure 6 shows the comparison of the dynamic power of ‘P’ state and ‘AP’ state sensing, at the voltage range of 0.7 V to 1 V. Notice that DDRS has the largest dynamic power and OCHS has the lowest dynamic power. The dynamic power consumption of LOC, PCSA and the proposed HMSS is in the middle level, whereas the DSTA is slightly higher than the OCHS. Due to the triple paths from VDD to ground, DDRS, PCSA and the proposed HMSS achieve the largest dynamic power, whereas the proposed HMSS is designed with the lowest power consumption compared with DDRS and PCSA.

Figure 6.

Figure 6

The dynamic sensing power consumption of different sensing circuits with (a) reading 0 and (b) reading 1. Due to the triple paths structure, the dynamic power dissipation of our proposed HMSS-SA is greater than OCHS but less than DDRS.

Figure 7 shows the static power of six SAs over a wide voltage range. Notice that whether using standard VTH (SVT) or low VTH (LVT) transistor, DDRS-SA is with the highest static power dissipation, whereas OCHS-SA achieves the lowest leakage cost. In addition, the static power of the proposed HMSS is at a high level, and the static power of the remaining three sense amplifiers is at an intermediate level. According to the analysis of the circuit, it can be found that the more the pMOS connect to VDD and the number of paths to the ground, the larger the static power of the circuit is. When operating at low-VDD region, the static power performance of SAs is not obvious except DDRS-SA designed with LVT transistor.

Figure 7.

Figure 7

Static power consumption versus Vdd scaling down in sensing circuits with (a) low VT transistor and (b) regular VT transistor.

3.3. Low-VDD Sensing

Using the optimized Vclamp and transistor dimension sizing, low Low-VDD operation can be realized in different sensing circuits. The Vclamp is configured to reach the maximum that satisfies the unwritten condition at low supply voltages. Figure 8a–c illustrates the sensing failure probability versus frequency at different Vdd nodes (TMR = 100%). The proposed HMSS shows the up to 2.5 GHz high frequency performance with nominal Vdd. At 0.65 V Vdd node, it can achieve 1 GHz operation frequency with 0.3% failure rate.

Figure 8.

Figure 8

(ac) Sensing failure probability versus sensing frequency at different Vdd node. TMR = 100%. (d) Successful sensing probability versus MTJ TMR at 1V Vdd. HMSS achieves an improved sensing rate even that the TMR was as low as 50%.

Figure 8d shows the successful sensing probability versus TMR equals to 50%, 100%, 150% and 200%, under 1 V Vdd. Notice that when TMR is greater than 150%, the successful sensing rate reaches 100% in the 1000 runs MC analysis. Compared with other SAs (with performance optimization), the optimized HMSS has the enhanced sensing probability (4–5.5% improvement) even with TMR at 50%.

3.4. Temperature-Aware Sensing

In order to reduce sensing failure probability, an ideal reference is preferred to locate in the middle of the read window (Ip & Iap mean). An evaluation of sensing current versus operation temperature is depicted in Figure 9. IAP is the current of the data path when reading logic ‘1’, whereas IP is for logic ‘0’. IREF is the current of reference path. As shown in Figure 9c,f, IREFAP (IREFP) is the average current of the two reference paths when reading logic ‘1’ (logic ‘0’) in the dual reference scheme. During the reading process, the DDRS-SA is implemented without clear demarcation point during the precharge phase and the differential voltage development phase, resulting in a large data path current when reading ‘1’ in the steady state and a relatively small current when reading ‘0’. Meanwhile, as DDRS is designed with dual reference scheme, the current of the ‘AP’ reference path and the ‘P’ path is also different.

Figure 9.

Figure 9

Sensing current versus temperature analysis in different sensing circuits: (a) PCSA, (b) OCHS, (c) DDRS, (d) LOC, (e) DSTA, (f) HMSS. The sensing margin at low temperature is greater than high temperature. Note that the sensing current in different circuits are lower than MTJ critical current (Ic0 = 50 µA).

The current difference between the AP and P reference path is around 2 µA to 6 µA, the current of the reference path is taken as the average current of the AP and the P reference path. The IREF in Figure 9a,e are approximately located in the middle of IP and IAP (sensing window), which shows robust performance in low and high temperature. The IREF in OCHS is close to IP at high temperature which exhibiting the sensitive to temperature changes of these two SAs. The relationship of current and temperature of the proposed HMSS is depicted in Figure 9f. When data is AP, the reference is with P path, so the IREFAP is changed to IP. When data is P, the reference is AP path, so IREFP = IAP. We notice that the proposed HMSS has stable performance in temperature changes.

3.5. Discussion

Table 3 lists the simulation results of different sensing circuits. The proposed HMSS demonstrates a better sensing margin, faster speed and higher stability. Compared with the other dual-reference sensing amplifier (DDRS, theoretically with the same sensing margin), it demonstrates an improved sensing speed and less variability induced failure. The success sensing rate is higher than that of the other dual reference SAs.

Table 3.

Summary of Low-VDD sensing circuit performance *.

T-VLSI’12 T-VLSI’14 TCAS-1’15 TCAS-1’17 T-VLSI’18 Proposed
Name/Abbreviation PCSA DSTA LOC DDRS OCHS HMSS
Dynamic power (µW) 25.36 8.29 18.92 42.26 7.263 25.57
Leakage power@Vdd = 1 V (pW) 24.35 24.16 29.07 67.14 25.31 52.75
Sensing latency ** (ns) 0.19 0.33 0.39 0.28 0.2 0.2
Sensing current (µA) 15.3 7.95 19.9 15 8.1 15.92
Minimum sensing Vdd realized
in 28 nm node (V)
0.65 0.65 0.7 0.55 0.65 0.65
Failure probability (%) TMR = 100% @Vdd = 1 V 0.2 1 0.3 0.5 0.3 0.3
TMR = 100% @Vdd = 0.7 V 0.4 1.1 1.3 0.4 0.2 0.4
TMR = 50% @Vdd = 1 V 8.3 10 10.2 10.1 9.4 4.7
TMR = 50% @Vdd = 0.7 V 8.6 10.3 14.8 10.7 9.5 5.2
Maximum sensing
frequency (MHz)
Vdd = 0.7 V, Yield > 90% 1000 1000 100 250 500 2500
Vdd = 0.7 V, Yield > 99% 500 250 N/A 100 250 2500
Sensing current
margin *** (µA)
@40 1.299 1.706 1.257 N/A 1.592 3.611
@25 C 1.034 1.757 0.954 N/A 1.65 3.088
@85 C (worst case) 0.807 2.271 0.841 N/A 1.58 2.547

* All sensing circuits are implemented with 28-nm CMOS technology. 200 nm/30 nm transistor dimension are mainly used. Performance optimization is executed by adjusting the dimension of several transistors. ** Sensing latency is the time from the beginning of the sensing phase to the stable of output voltage. *** Current difference between IAP and IP (IAP/IP, current flowing through RAP/RP). Due to the feedback path in DDRS, current margin is not recorded.

Compared to the OCHS-SA with the lowest dynamic power consumption, the sensing margin of HMSS is about twice of OCHS, which means that in the case of the immature MTJ fabrication process, it contributes much more to the stability of MTJ reading. Secondly, with 0.7 V Vdd and the same configuration of transistor, the success rate reading ‘0’ and reading ‘1’ is about 95.4% and 96.4%, and the success rate of OCHS sensing amplifier for reading ‘1’ is only 91.8%. There is also a serious disparity when reading ‘0’.

Nominal/high VDD can effectively guarantee the sensing margin and speed. However, high sensing VDD may induce the read disturbance. For current sensing scheme, clamp transistor with Vclamp must be carefully designed. For low-VDD implementation in MRAM sensing, the trade-off of yield, speed, power and area are sequentially considered and optimized in this work. In fact, no matter MRAM is implemented with low or high-VDD, high successful sensing probability must be guaranteed to alleviate the workload of ECC blocks.

We also notice that some design details must be emphasized, e.g., reference scheme. If applying the local reference scheme to previous SAs to track bit-cell variations (as in the proposed SA), the power consumption of previous SAs will be larger than this work. In this work, the SA implementation has not been hierarchically related to the higher system/chip level. Considering the entire MRAM macro, additional power consumption and layout area is a small portion when comparing with error-coding correction blocks and redundancy blocks.

4. Low-VDD Sensing: A Case Study in Logic-in-Memory

4.1. The Modified Logic-in-Memory

A promising candidate to achieve energy-efficient spintronic circuit design is to simultaneously use MTJs for storage units and logic operation/computation. Spintronics-based bit-wise processing-in-memory (PIM), computing-in-memory (CIM), logic-in-memory (LIM) maily rely on CMOS circuit-level implementation for logic operation, which can achieve massive parallelism, high bandwidth and high density while minimizing power and cost [9,39,40,41]. Typical spintronics-based Pinotubo [39], STT-CIM [40], and NV-LIM [9] require SA modification, as well as additional reference circuits to support logic operations. A true Spintronics PIM semantic is proposed within a RAM array as distinguished from previous CMOS-based solutions, which is referred as computational RAM (CRAM) [41]. Among these schemes, NV-LIM is a prototype validated method using additional pass-transistor-logic network within MTJ nonvolatile data sensing paths [9].

Regardless of in memory computing schemes, the SA circuit is an indispensable building block in spintronics-based circuits. In order to further demonstrate HMSS-SA performance in LIM scenario, a modified NV-LIM block diagram is demonstrated in Figure 10, including Bit-wise operations AND, OR, XOR as well as the full adder.

Figure 10.

Figure 10

HMSS-SA with improved sensing margin applied to the modified logic-in-memory scenario.

PCSA can effectively perform OR operations and AND operations, but cannot perform a correct XOR operation, as the reference path and the data path need to be exchanged when performing XOR. PCSA uses the average current of the two reference paths taken by the CM reference, the reference path and the data path cannot be normally exchanged.

OCHS, DSTA and LOC can be directly combined with the LIM as described in Figure 10. The principles of OR, AND and XOR are implemented by LIM as follows (B is the data stored in MRAM, A is the data that has been read and applied to the control transistor): (1) The implementation of the OR operation, when A = ‘0’, B is normally read as the operation result, when A = ‘1’, the data path portion of the LIM is turned off, the output result is always ‘1’; (2) The implementation of the AND operation, when A = ‘1’, B is normally read as the operation result, when A = ‘0’, the reference path portion of the LIM is turned off, the output result is always ‘1’; (3) Implementation of XOR operation, when A = ‘0’, B is normally read as the operation result, when A = ‘1’, the data path of LIM is exchanged with the reference path, and the data is read as the operation result.

DDRS and HMSS are implemented with the dual-reference scheme so that a modified LIM is required, which comes from: (1) single MTJ is used to store one bit of data, whereas two MTJs are used to store one bit of data in previous literature. (2) The SA in the LIM is with dual-paths, whereas DDRS and HMSS are implemented with triple-path in this work.

4.2. Failure Probability of Modified LIM

Table 4 compares the bit-wise computation failure rate when using different sensing circuits in the modified LIM structure under 0.7 V Vdd. Notice that the failure probability in the modified LIM is lower than sensing circuits. The reason is that when the data path portion (in OR operation) and the reference path portion (in AND operation) are turned off, the output result of data path and reference path is always logic ‘1’ so that 100% sensing probability can be guaranteed.

Table 4.

The failure probability of LIM bit-wise operation with low TMR=50%.

TMR = 50% PCSA DSTA LOC DDRS OCHS HMSS
Vdd=1.0 V
OR/AND 4.1% 5.0% 5.1% 4.7% 4.6% 2.3%
XOR N/A 10.1% 10.2% 9.9% 9.35% 6.85%
Vdd=0.7 V
OR/AND 4.3% 5.15% 7.4% 5.35% 4.75% 2.6%
XOR N/A 10.3% 14.8% 10.1% 9.5% 6.95%

5. Conclusions

In this work, previous MRAM sensing circuits were investigated using 28-nm CMOS technology with process-voltage-temperature aware considerations. A novel sensing circuit named HMSS was proposed for low-VDD high yield MRAM design. The proposed circuit uses the current model, dual reference scheme as well as modified pre-charged pMOSFET to enhance the sensing margin. The simulation results show that HMSS achieved high sensing speed at 1 V nominal Vdd, and low failure probability (0.4% with TMR = 100%) at 0.7 V low Vdd. Process variations, wide temperature range and Vdd scaling were investigated for sensing operation with high reliability. Compared with previous works, HMSS achieved an improved successful sensing rate even the TMR was as low as 50%. A modified logic-in-memory circuit was implemented with reduced sensing probability. The presented results give useful insights in the 28-nm node MRAM sensing circuit, and provide design guidelines for logic-in-memory spintronics circuits and architectures.

Author Contributions

Conceptualization, Y.G.; Data curation, Z.B. and X.H.; Investigation, H.C.; Methodology, Z.B. and X.H.; Software, Y.G.; Supervision, L.N. and W.G.; Validation, Z.B.; Visualization, W.G.; Writing—original draft, Z.B. and H.C.; Writing—review & editing, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

National Key R&D Program of China (Grant No. 2018YFB2202102), and the Fundamental Research Funds for the Central Universities (2242021k30031).

Conflicts of Interest

The funders had no role in the design of the study.

Footnotes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Ikeda S., Miura K., Yamamoto H., Mizunuma K., Gan H.D., Endo M., Kanai S., Hayakawa J., Matsukura F., Ohno H. A perpendicular-anisotropy CoFeB-MgO magnetic tunnel junction. Nat. Mater. 2010;9:721–724. doi: 10.1038/nmat2804. [DOI] [PubMed] [Google Scholar]
  • 2.Wang M., Cai W., Cao K., Zhou J., Wrona J., Peng S., Yang H., Wei J., Kang W., Zhang Y., et al. Current-induced magnetization switching in atom-thick tungsten engineered perpendicular magnetic tunnel junctions with large tunnel magnetoresistance. Nat. Commun. 2018;9:671. doi: 10.1038/s41467-018-03140-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Cai H., Liu B., Chen J., Naviner L., Zhou Y., Wang Z., Yang J. A survey of in-spin transfer torque mram computing. Sci. China Inf. Sci. 2021;64:160402. [Google Scholar]
  • 4.Noguchi H., Ikegami K., Kushida K., Abe K., Itai S., Takaya S., Shimomura N., Ito J., Kawasumi A., Hara H., et al. A 3.3ns-access-time 71.2 µW/MHz 1Mb embedded STT-MRAM using physically eliminated read-disturb scheme and normally-off memory architecture; Proceedings of the 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers; San Francisco, CA, USA. 22–26 February 2015; pp. 1–3. [Google Scholar]
  • 5.Rho K., Tsuchida K., Kim D., Shirai Y., Bae J., Inaba T., Noro H., Moon H., Chung S., Sunouchi K., et al. A 4 Gb LPDDR2 STT-MRAM with compact 9F2 1T1MTJ cell and hierarchical bitline architecture; Proceedings of the 2017 IEEE International Solid-State Circuits Conference (ISSCC); San Francisco, CA, USA. 5–9 February 2017; pp. 396–397. [Google Scholar]
  • 6.Ohsawa T., Ikeda S., Hanyu T., Ohno H., Endoh T. Trend of tunnel magnetoresistance and variation in threshold voltage for keeping data load robustness of metal–oxide–semiconductor/magnetic tunnel junction hybrid latches. J. Appl. Phys. 2014;115:17C728. doi: 10.1063/1.4867129. [DOI] [Google Scholar]
  • 7.Fong X., Kim Y., Choday S.H., Roy K. Failure Mitigation Techniques for 1T-1MTJ Spin-Transfer Torque MRAM Bit-cells. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2014;22:384–395. doi: 10.1109/TVLSI.2013.2239671. [DOI] [Google Scholar]
  • 8.Na T., Kang S.H., Jung S.O. STT-MRAM Sensing: A Review. IEEE Trans. Circuits Syst. II Express Briefs. 2021;68:12–18. doi: 10.1109/TCSII.2020.3040425. [DOI] [Google Scholar]
  • 9.Natsui M., Suzuki D., Sakimura N., Nebashi R., Tsuji Y., Morioka A., Sugibayashi T., Miura S., Honjo H., Kinoshita K., et al. Nonvolatile Logic-in-Memory LSI Using Cycle-Based Power Gating and its Application to Motion-Vector Prediction. IEEE J. Solid State Circuits. 2015;50:476–489. doi: 10.1109/JSSC.2014.2362853. [DOI] [Google Scholar]
  • 10.Cai H., Wang Y., Zhao W., de Barros Naviner L. Multiplexing Sense Amplifier Based Magnetic Flip-Flop in 28 nm FDSOI Technology. IEEE Trans. Nanotechnol. 2015;14:761–767. doi: 10.1109/TNANO.2015.2438017. [DOI] [Google Scholar]
  • 11.Cai H., Wang Y., de Barros Naviner L., Zhao W. Robust Ultra-Low Power Non-Volatile Logic-in-Memory Circuits in FD-SOI Technology. IEEE Trans. Circuits Syst. I Regul. Pap. 2016;64:847–857. doi: 10.1109/TCSI.2016.2621344. [DOI] [Google Scholar]
  • 12.Chang M.F., Shen S.J., Liu C.C., Wu C.W., Lin Y.F., King Y.C., Lin C.J., Liao H.J., Chih Y.D., Yamauchi H. An Offset-Tolerant Fast-Random-Read Current-Sampling-Based Sense Amplifier for Small-Cell-Current Nonvolatile Memory. IEEE J. Solid State Circuits. 2013;48:864–877. doi: 10.1109/JSSC.2012.2235013. [DOI] [Google Scholar]
  • 13.Wu B., Cheng Y., Yang J., Todri-Sanial A., Zhao W. Temperature Impact Analysis and Access Reliability Enhancement for 1T1MTJ STT-RAM. IEEE Trans. Reliab. 2016;65:1755–1768. doi: 10.1109/TR.2016.2608910. [DOI] [Google Scholar]
  • 14.Lin I.C., Law Y.K., Xie Y. Mitigating BTI-Induced Degradation in STT-MRAM Sensing Schemes. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2018;26:50–62. doi: 10.1109/TVLSI.2017.2764520. [DOI] [Google Scholar]
  • 15.Kim T., Liu J., Kim C.H. A Voltage Scalable 0.26 V, 64 kb 8T SRAM with Vmin Lowering Techniques and Deep Sleep Mode. IEEE J. Solid State Circuits. 2009;44:1785–1795. doi: 10.1109/JSSC.2009.2020201. [DOI] [Google Scholar]
  • 16.Chang I.J., Kim J., Park S.P., Roy K. A 32 kb 10 T Subthreshold SRAM Array with Bit-Interleaving and Differential Read Scheme in 90 nm CMOS; Proceedings of the 2008 IEEE International Solid-State Circuits Conference-Digest of Technical Papers; San Francisco, CA, USA. 3–7 February 2008; pp. 388–622. [Google Scholar]
  • 17.Trinh Q.K., Ruocco S., Alioto M. Dynamic Reference Voltage Sensing Scheme for Read Margin Improvement in STT-MRAMs. IEEE Trans. Circuits Syst. I Regul. Pap. 2018;65:1269–1278. doi: 10.1109/TCSI.2017.2749522. [DOI] [Google Scholar]
  • 18.Lee A., Lee H., Ebrahimi F., Lam B., Chen W.H., Chang M.F., Amiri P.K., Wang K.L. A Dual-Data Line Read Scheme for High-Speed Low-Energy Resistive Nonvolatile Memories. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2018;26:272–279. doi: 10.1109/TVLSI.2017.2766150. [DOI] [Google Scholar]
  • 19.Zhang H., Kang W., Zhang Y., Chang M., Zhao W. A Full-Sensing-Margin Dual-Reference Sensing Scheme for Deeply-Scaled STT-RAM. IEEE Access. 2018;6:64250–64260. doi: 10.1109/ACCESS.2018.2878012. [DOI] [Google Scholar]
  • 20.Na T., Song B., Choi S., Kim J.P., Kang S.H., Jung S.O. Offset-Canceling Single-Ended Sensing Scheme with One-Bit-Line Precharge Architecture for Resistive Nonvolatile Memory in 65-nm CMOS. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2019;27:2548–2555. doi: 10.1109/TVLSI.2019.2925931. [DOI] [Google Scholar]
  • 21.Na T., Song B., Kim J.P., Kang S.H., Jung S.O. Offset-Canceling Current-Sampling Sense Amplifier for Resistive Nonvolatile Memory in 65 nm CMOS. IEEE J. Solid State Circuits. 2017;52:496–504. doi: 10.1109/JSSC.2016.2612235. [DOI] [Google Scholar]
  • 22.Na T., Kim J., Song B., Kim J.P., Kang S.H., Jung S.O. An Offset-Tolerant Dual-Reference-Voltage Sensing Scheme for Deep Submicrometer STT-RAM. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2016;24:1361–1370. doi: 10.1109/TVLSI.2015.2453192. [DOI] [Google Scholar]
  • 23.Lei Z.Q., Li G.J., Egelhoff W.F., Lai P.T., Pong P.W.T. Review of Noise Sources in Magnetic Tunnel Junction Sensors. IEEE Trans. Magn. 2011;47:602–612. doi: 10.1109/TMAG.2010.2100814. [DOI] [Google Scholar]
  • 24.Torunbalci M.M., Upadhyaya P., Bhave S.A., Camsari K.Y. Modular Compact Modeling of MTJ Devices. IEEE Trans. Electron Devices. 2018;65:4628–4634. doi: 10.1109/TED.2018.2863538. [DOI] [Google Scholar]
  • 25.Wang S., Lee H., Ebrahimi F., Amiri P.K., Wang K.L., Gupta P. Comparative Evaluation of Spin-Transfer-Torque and Magnetoelectric Random Access Memory. IEEE J. Emerg. Sel. Top. Circuits Syst. 2016;6:134–145. doi: 10.1109/JETCAS.2016.2547681. [DOI] [Google Scholar]
  • 26.Zhang Y., Zhao W., Lakys Y., Klein J., Kim J., Ravelosona D., Chappert C. Compact Modeling of Perpendicular-Anisotropy CoFeB/MgO Magnetic Tunnel Junctions. IEEE Trans. Electron Devices. 2012;59:819–826. doi: 10.1109/TED.2011.2178416. [DOI] [Google Scholar]
  • 27.Kim J., Ryu K., Kang S.H., Jung S. A Novel Sensing Circuit for Deep Submicron Spin Transfer Torque MRAM (STT-MRAM) IEEE Trans. Very Large Scale Integr. VLSI Syst. 2012;20:181–186. doi: 10.1109/TVLSI.2010.2088143. [DOI] [Google Scholar]
  • 28.Na T., Woo S., Kim J., Jeong H., Jung S. Comparative Study of Various Latch-Type Sense Amplifiers. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2014;22:425–429. doi: 10.1109/TVLSI.2013.2239320. [DOI] [Google Scholar]
  • 29.Song B., Na T., Kim J., Kim J.P., Kang S.H., Jung S. Latch Offset Cancellation Sense Amplifier for Deep Submicrometer STT-RAM. IEEE Trans. Circuits Syst. I Regul. Pap. 2015;62:1776–1784. doi: 10.1109/TCSI.2015.2427931. [DOI] [Google Scholar]
  • 30.Kang W., Pang T., Lv W., Zhao W. Dynamic Dual-Reference Sensing Scheme for Deep Submicrometer STT-MRAM. IEEE Trans. Circuits Syst. I Regul. Pap. 2017;64:122–132. doi: 10.1109/TCSI.2016.2606438. [DOI] [Google Scholar]
  • 31.Bagheriye L., Toofan S., Saeidi R., Moradi F. Offset-Compensated High-Speed Sense Amplifier for STT-MRAMs. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2018;26:1051–1058. doi: 10.1109/TVLSI.2018.2808140. [DOI] [Google Scholar]
  • 32.Han M., Cai H., Yang J., Naviner L., Wang Y., Zhao W. Stability and Variability Emphasized STT-MRAM Sensing Circuit With Performance Enhancement; Proceedings of the 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS); Chengdu, China. 26–30 October 2018; pp. 386–389. [Google Scholar]
  • 33.Huda S., Sheikholeslami A. A Novel STT-MRAM Cell with Disturbance-Free Read Operation. IEEE Trans. Circuits Syst. I Regul. Pap. 2013;60:1534–1547. doi: 10.1109/TCSI.2012.2220458. [DOI] [Google Scholar]
  • 34.Yang X., Zhang Z., Zhu W., Yu S., Liu L., Wu N. Deterministic conversion rule for CNNs to efficient spiking convolutional neural networks. Sci. China Inf. Sci. 2020;63:1–19. doi: 10.1007/s11432-019-1468-0. [DOI] [Google Scholar]
  • 35.Zhang W., Gao B., Yao P., Tang J., Qian H., Wu H. Array-level boosting method with spatial extended allocation to improve the accuracy of memristor based computing-in-memory chips. Sci. China Inf. Sci. 2021;64:1–9. doi: 10.1007/s11432-020-3198-9. [DOI] [Google Scholar]
  • 36.Matsunaga S., Hayakawa J., Ikeda S., Miura K., Endoh T., Ohno H., Hanyu T. MTJ-based nonvolatile logic-in-memory circuit, future prospects and issues; Proceedings of the 2009 Design, Automation Test in Europe Conference Exhibition; Nice, France. 20–24 April 2009; pp. 433–435. [DOI] [Google Scholar]
  • 37.Wang Y., Cai H., de Barros Naviner L.A., Zhang Y., Zhao X., Deng E., Klein J.-O., Zhao W. Compact Model of Dielectric Breakdown in Spin-Transfer Torque Magnetic Tunnel Junction. IEEE Trans. Electron Devices. 2016;63:1762–1767. doi: 10.1109/TED.2016.2533438. [DOI] [Google Scholar]
  • 38.MTJ Compact Model. [(accessed on 3 February 2020)];2020 Available online: http://www.spinlib.com/
  • 39.Li S., Xu C., Zou Q., Zhao J., Lu Y., Xie Y. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories; Proceedings of the 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC); Austin, TX, USA. 5 June 2016; pp. 1–6. [Google Scholar]
  • 40.Jain S., Ranjan A., Roy K., Raghunathan A. Computing in Memory With Spin-Transfer Torque Magnetic RAM. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2018;26:470–483. doi: 10.1109/TVLSI.2017.2776954. [DOI] [Google Scholar]
  • 41.Chowdhury Z., Harms J.D., Khatamifard S.K., Zabihi M., Lv Y., Lyle A.P., Sapatnekar S.S., Karpuzcu U.R., Wang J. Efficient In-Memory Processing Using Spintronics. IEEE Comput. Archit. Lett. 2018;17:42–46. doi: 10.1109/LCA.2017.2751042. [DOI] [Google Scholar]

Articles from Micromachines are provided here courtesy of Multidisciplinary Digital Publishing Institute (MDPI)

RESOURCES