# Journal of Electron Devices Vol. 23, Num. 1, 2016, pp 1927-1933

Journal of

# **Electron Devices**

www.jeldev.org

©JED[ISSN:1682 -3427] (print) ©JED[ISSN:1682 -3427] (online)



# Novel Low Power Design Techniques of Read-Out Path Circuit in a Register File

M. Nagarjuna, D. Mahesh Babu, S. Rajendar Department of Electronics & Communication Engineering, Vardhaman College of Engineering (Autonomous), Hyderabad, Telangana, India.

#### **Abstract**

For a high-speed register file in a microprocessor, the read-out path is designed with wide fan-in dynamic multiplexers employed with dynamic logic style. As technology scales down, increased leakage currents and reduced noise margins significantly degrade the robustness of dynamic circuits. A wide fan-in dynamic circuit with high switching activity introduces significant power overhead that possesses a limitation. In this paper, novel design techniques have been proposed which improves the noise tolerance beyond the level of the conventional dynamic logic gate. Using circuit simulations, the overall improved characteristics of the proposed techniques are demonstrated. Power supply and delay for the proposed design techniques has been observed in a 45-nm technology, and they are designed in cadence virtuoso environment and simulated with cadence spectre simulation.

Keywords: register file, dynamic multiplexer, dynamic logic-circuit, switching activity, low-power design

# I. INTRODUCTION

Dynamic logic gate with high fan-in is the most critical component of register file in modern microprocessors [1], [2]. It requires less area and reduced load capacitance, hence enhances the speed compared to the static logic circuit. The static logic circuit is employed with long stack of PMOS and NMOS and makes the circuit complexity which increases the delay and area.

Here the dynamic logic uses dual phase, namely, prerecharge and evaluation, to implement a complex circuit with the single evaluation network. But the dynamic circuit has the drawback of high power consumption and reduced noise margin due to charge sharing and charge leakage in the internal of the circuit.

Charge sharing in a circuit can be compensated by adding a keeper transistor. It consumes more power due to the unwanted redundant switching at dynamic and output nodes compared to static CMOS circuit [3].

Different circuits are implemented to deal with this issue. Footer voltage feed forward domino (FVFD) technique [4] and static switching pulse domino (SSPD) technique are also implemented for the design of dynamic multiplexer [5] that improves the noise tolerance, and also both techniques reduce the switching power by limiting the voltage swing on the large bit line capacitance by implementing the

This paragraph of the first footnote will contain the support information, including sponsor and financial support acknowledgment.

The next few paragraphs should contain the authors' current affiliations, including current address and e-mail. For example:

F. A. Author is with the National Center for Scientific Research. (e-mail: author@ cnrs.org).

S. B. Author, was with Technical University of CTS . He is now with the Department of Physics (e-mail: author@cts.edu).

T. C. Author is with the Electrical Engineering Department, University of, on leave from the National Research Institute for Metals (e-mail: author@nrim.org).

concept of dual dynamic nodes. But the drawback of FVFD and SSPD techniques is consuming more power and requires more area than the single keeper domino technique.

This paper is organized as follows. Process variation of a conventional Dynamic multiplexer is presented in Section 2. Previous work 3. Proposed structure 4. Average Power Dissipation and delay of dynamic multiplexer are compared in Section 5. Finally, some conclusions are formulated in Section 7.

# II. PROCESS VARIATION OF A CONVENTIONAL DYNAMIC MULTIPLEXER

In a high speed register files the circuit of the read-out paths is a wide fan-in dynamic multiplexer. These dynamic multiplexer use dynamic logic gates offer low latency, because of the pull-up network does not require a stack of PMOS transistor. In the pull down network (PDN) of this logic style, inputs are applied to the NMOS devices.

To perform a bit line read operation on a register file having 2<sup>n</sup> Registers, it requires a dynamic multiplexer structure with 2<sup>n</sup> parallel inputs. Dynamic logic is done by charging and discharging the capacitance depending on condition of logic inputs. The only limitation of the dynamic logic is poor noise margin.

Traditionally, to overcome this issue a PMOS keeper is employed that compensates for leakage current drawn out of the dynamic node to the pull-down network [6]. As it can be observed [Fig. 1], if all inputs are zero the dynamic node remains high.



Fig 1\_Dynamic OR gate with traditional keeper.

Simultaneously the PMOS keeper transistor M2 will stay ON to compensate for any leakage current drawn out of the dynamic node. In this case, keeper M2 is desirable to increase noise margin. In the case of at least one input is high the dynamic node DYN should remain low, however in this case a week keeper is preferred to speed up the switching transition.

These requirements give rise to tradeoff between performance and noise margin of dynamic circuit in sub-100nm technology. Different techniques have been implemented to get rid of the trade off [7] - [9].

#### III. PREVIOUS WORK

# A. Footer Voltage Feed forward Domino (FVFD) Technique

A Footer voltage feed forward domino (FVFD) technique includes a first evaluation unit and a second evaluation unit. The first evaluation unit pre-charges a first dynamic node and discharges a footer node (foot) in a first phase of a clock signal. In a second phase of the clock signal, it evaluates input signals to determine a logic level of the first dynamic node, as shown in the Fig. 2.

At the second evaluation unit, the DYN2 that is connected to the first dynamic node DYN1 through NAND gate precharges DYN2 in the first phase of the clock signal, and determines a logic level of the second dynamic node in response to a logic level of the footer node in the second phase of the clock signal.



Fig 2\_FVFD technique applied to the dynamic multiplexer in an RF bitline read-out path

The output of the evaluation phase is taken from the first and second dynamic nodes, provides an output signal having a logic level according to the levels of a first voltage of the first dynamic node and a second voltage of the second dynamic node. The primary dynamic node with large capacitance will experience a limited voltage swing during both the phases, and the second dynamic node, DYN2, undergoes a complete rail-to-rail swing due to separation from the pull-down network.

SSPD technique is similar to that of the SP-domino with static input and output characteristics [10] applied to a dynamic multiplexer. Compared to SP-domino, SSPD employs separate transistors M1 and M2 respectively. Both transistors M1 and M2 never turn ON simultaneously.

A clocked transistor M4 separates the main dynamic node from the second dynamic node. SSPD allows independent tuning of rise and fall delay as it employs a conditional pulse generator (CPG) as shown in Fig.3. When the CPG generates a pulse, M1 turn ON only when the dynamic node has been discharged or held low in the last evaluation cycle and keeper M2 turns OFF.



Fig 3 \_SSPD technique applied to the dynamic multiplexer in an RF bitline read-out circuit.

If the dynamic node is not discharged in the previous cycle, M1 is OFF and so, when the pull-down network is ON, it faces contention from the keeper transistor. CPG internally generates two additional clock phases CCLKd and CCLKi. Their behaviour is related to the clock signal (CLK) and the dynamic node.

The two clock phases utilized by the block CG in CPG to produce pulse signal CP. The drawback of this technique is that it required complex conditional pulse generator. The logical expression of the two signals and gate G can be written as follows

$$CCLK_d = \overline{CLK + DYN_2}$$

$$CCLK_i = \overline{CLK + DYN_2}$$

$$CP = \overline{CLK.CCLK_1 + DYN_2.CCLK_d}$$
IV. PROPOSED STRUCTURE

A. Static Switching Pulse Domino using Delay Clock

The proposed circuit with reduced complexity has static input and output characteristics, its circuit diagram as shown in Fig. 4 Voltage characteristics at different nodes as shown in Fig. 5 Combination of M1 and M2 works as a function of pull up the network and M3 as keeper transistor. Input of M2 and M4 is clocked (CLK) and M1 is non-inverting delayed clock (CLKd).

Dynamic node is conditional pre-charge to high voltage only when both the transistors of pull up network turns ON with the inputs turns OFF simultaneously at the start of each clock high for a short duration of time Td. Using this technique, it allows the propagation of pre - charge pulses to the dynamic node and avoids the pre-charge pulse to the output node.



Fig 4\_Proposed technique applied to the dynamic multiplexer in an RF bitline read-out circuit.

At the start of a clock high, for a short period of time TD, M2 is a high and delayed clock CLKd is low which M1 turn to the previous value. In this case, when RWL is high and the pull down network is high the dynamic node DYN discharges the current through the pull down network, as the M5 also turns ON for a short period of time in the circuit. After the delay of time Td, both CLK and CLKd is high and M1 turn OFF. Thus, no further contention current flows to the pull down network and dynamic node remains at logic low.

In the other case, when the input is low, at the start of clock high, the CLK is the high and delayed clock clicked is low. M1 and M2 turned ON simultaneously, and charge the dynamic node to logic high. For the rest of the clock high, both CLK and CLKd are high, M1 turns OFF. Dynamic node remains at logic high. As shown in Fig. 7.

When for the CLK low, CLKD turns ON after some delay which makes the M1 transistor ON. But the M2 transistor stays ON that stops the current flow through the circuit and makes the dynamic node DYN low when the input signal is high. And the dynamic node DYN stays ON when the input signal goes low. Simulation result is shown in the above proposed circuit that satisfies the circuit operation.



Fig 5 Simulation wave form of a proposed technique

B. Footer Voltage Feed Forward Domino using Body Bias Technique.

FVFD technique using body bias technique is applied to the dynamic multiplexer in a read-out path [11]. It involves connecting the transistor body to a bias network in the circuit through external source rather than to power or ground.

To generate a reverse body bias voltage and/or a voltage divider to generate a forward body bias voltage, the design usually includes a charge pump circuit. Reverse body bias, applying a negative body-to-source voltage of a n-channel transistor, makes the transistor, both slower and less leaky by raising the threshold voltage.

Forward body bias, on the other hand, lowers the threshold voltage by applying a positive body-to-source voltage of a n-channel transistor and thereby makes the transistor faster. The operation of the proposed technique shown in Fig. 6, and the voltage characteristics are shown in Fig. 7.



Fig 6\_Footer Voltage Feed Forward Domino using Body Bias Technique.

A voltage of 2V DC bias is connected to the transistor bodies of the M1, M2 and M3 as shown in the circuit diagram. It reduces the power consumption using the body bias technique. The operation of the proposed circuit is when a read word line (RWL) goes high for a clock high, then pull up the transistor M1 turns OFF and the charge that is stored on the first dynamic node DYN1 is distributed between DYN1 and FOOT.

During this process a slow path evaluates the output through a high –skewed NAND gate and the fast feed forward path uses the voltage developed on FOOT. It turns ON the M5 and makes the second dynamic node DYN2 with low parasitic capacitance to pull down the charge to ground through M5. The clocked transistor M4 serves to cut off the short-circuit path through M3 and M5. It will exist at the start of the recharge phase. The transistor M7 is added to prevent charge building up on the FOOT node.

When the RWL goes low for a clock high, there is no direct path for the charge to flow through the foot and the M5 turns OFF. The slow path of DYN1 and the DYN2 connected to the NAND gate becomes low and gives the output high. Simulation result of the circuit operation is clearly shown in the Fig.7.



Fig 7\_Simulation waveform of a Footer Voltage Feed Forward Domino using Body Bias Technique

#### V. RESULTS AND DISCUSSIONS

The conventional, FVFD, SSPD and the proposed techniques are tested by cadence Spectre using a 45nm standard CMOS process technology. Table I, and II summarize the energy dissipation, Delay and Power product delay values of 16-bit and 32-bit conventional, FVFD, SSPD and the proposed techniques. From this table, it has been found that energy dissipation of the proposed techniques is drastically reduced compared to those of the FVFD and SSPD techniques.

Table 1. Comparison of Energy Dissipation of 16-Bit existing and proposed techniques at a supply voltage of 1V and operating temperature of 27 °C

|              | 16-bit | 16-bit | 16-bit |
|--------------|--------|--------|--------|
| Techniques   | Power  | Delay  | PDP    |
|              | (µW)   | (ns)   | (fJ)   |
| Conventional | 10.675 | 29.93  | 319.5  |
|              |        |        | 0      |
| FVFD         | 6.914  | 29.31  | 202.6  |
|              |        |        | 5      |
| SSPD         | 5.291  | 0.410  | 2.16   |
| Proposed     | 4.934  | 20.27  | 100.0  |
| FVFD         |        |        | 1      |
| Proposed     | 3.982  | 0.280  | 1.11   |
| SSPD         | 3.902  | 0.200  | 1.11   |
|              |        |        |        |

Table 2. Comparison of Energy Dissipation of 32-Bit existing and proposed techniques at a supply voltage of 1v and operating temperature of 27 °C

|                  | 32-bit | 32-bit | 32-bit |
|------------------|--------|--------|--------|
| Techniques       | Power  | Delay  | PDP    |
|                  | (µW)   | (ns)   | (fJ)   |
| Conventional     | 10.806 | 29.34  | 317.04 |
| FVFD             | 7.897  | 30.28  | 239.12 |
| SSPD             | 7.194  | 0.472  | 3.40   |
| Proposed<br>FVFD | 5.263  | 24.93  | 131.20 |
| Proposed<br>SSPD | 5.231  | 0.287  | 1.50   |



Fig 8\_Comparison of Power (μW) in 16-bit and 32-bit of FVFD and proposed FVFD techniques in 45nm Technology



Fig 9\_Comparison of Power ( $\mu W$ ) in 16-bit and 32-bit of SSPD and proposed SSPD techniques in 45nm Technology



Fig 10\_Comparison of Delay (ns) in 16-bit and 32-bit of FVFD and proposed FVFD techniques in 45nm Technology



Fig 11\_Comparison of Delay (ns) in 16-bit and 32-bit of SSPD and proposed SSPD techniques in 45nm Technology



Fig 12\_Comparison of PDP (fJ) in 16-bit and 32-bit of FVFD and proposed FVFD techniques in 45nm Technology



Fig 13\_Comparison of PDP (fJ) in 16-bit and 32-bit of SSPD and proposed SSPD techniques in 45nm Technology

# VI. CONCLUSION

In this paper, energy efficient dynamic multiplexers are proposed. The conventional, FVFD, SSPD, and proposed FVFD with body bias and proposed SSPD techniques are simulated using Cadence Spectre in 45nm technology. We found that the average power reduction for proposed circuits in 45nm technology is 29%.

# ACKNOWLEDGMENT

The authors would like to acknowledge the support received from the Department of Electronics and Communications, Vardhaman College of Engineering for software measurements and also for their valuable comments and suggestions

### REFERENCES

- [1] W. Hwang, R. V. Joshi, and W. H. Henkel, "A 500-MHz, 32-word 64-bit, eight-port self-resetting CMOS register file," *IEEE J. Solid-state Circuits*, vol. 34, no. 1, pp. 56–67, Jan. 1999.
- [2] R. K. Krishnamurthy, A. Alvandpour, G. Balamurugan, N. Shanbhag, K. Soumyanath, and S. Y. Borkar, "A 130-nm 6-GHz 256 32 bit Leakage-tolerant register file," *IEEE J. Solid-state Circuits*, vol. 37, no. 5, pp. 624–632, May 2002.
- [3] D. Li and P. Mazumder, "On circuit techniques to improve noise immunity of CMOS dynamic logic," *IEEE Trans. Very Large Scale Integer. (VLSI) Syst.*, vol. 12, no. 9, pp. 910–925, Sept. 2004.
- [4] Rahul Singh, GI-Moon Hong, and Suhwan Kim, "Bitline techniques with dual dynamic nodes for low-power register files," *IEEE Trans. On circuits and systems-*1: Regular papers, vol. 60, no. 4, April 2013.
- [5] C. J. Akl and M. A. Bayoumi, "Single-phase SP domino: A limited switching dynamic circuit technique for low-power wide fan-in logic gates," *IEEE Trans*.

- Circuits Syst. II, Exp. Briefs, vol. 55, no. 2, pp. 141–145, Feb. 2008.
- [6] H. F. Dodger and K. Banerjee, "A novel variation-tolerant keeper architecture for high-performance, low-power wide fan-in dynamic OR gates," *IEEE Trans. Very Large Scale Integer. (VLSI) Syst.*, vol. 18, no. 11, pp. 1567–1577, Nov. 2010.
- [7] M. W. Allam, M. H. Anis, and M. I. Elmasry, "High-speed dynamic logic styles for scaled-down CMOS and MTCMOS technologies," in *Proc. Int. Symp. Low Power Electron. Des. (ISLPED)*, 2000, pp. 155–160.
- [8] A. Alvandpour, R. K. Krishnamurthy, K. Soumyanath, and S. Y. Borkar, "A sub-130-nm conditional keeper technique," *IEEE J. Solid-state Circuits*, vol. 37, no. 5, pp. 633–638, May 2002.
- [9] H. Mahmoodi-Meimand and K. Roy, "Diode-footed domino: A leakage-tolerant, high fan-in dynamic circuit design," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 51, no. 3, pp. 495–503, Mar. 2004.
- [10] A. K. Pandey, R. A. Mishra, and R. K. Nagaria Static Switching Dynamic Buffer Circuit Department of Electronics and Communication, MNNIT, Allahabad 211004, India.
- [11] Volkan Kursun, Member, IEEE, and Eby G. Friedman, Fellow, IEEE Domino Logic With Variable Threshold Voltage Keeper IEEE transactions on very large scale integration (VLSI) systems, vol. 11, no. 6, December 2003