# A 3us Wake-up Time Nonvolatile Processor Based on Ferroelectric Flip-Flops

Yiqun Wang \* Yongpan Liu\* Shuangchen Li\* Daming Zhang\* Bo Zhao \*

Mei-Fang Chiang<sup>†</sup> Yanxin Yan<sup>†</sup>

Baiko Sai<sup>†</sup> Huazhong Yang<sup>\*</sup>

{ypliu,yanghz}@tsinghua.edu.cn

\* Tsinghua National Laboratory for Information

{baiko.sai,meifang.chiang}@dsn.rohm.co.jp <sup>†</sup> LSI Development Headquarters, Rohm Co., Ltd. Yokohama, 222-8575, Japan

Science and Technology, EE Dept.,Tsinghua University Beijing, 100084, China

Abstract—Nonvolatile processors offer a number of desirable properties including instant on/off, zero standby power and resilience to power failures. This paper presents a fabricated nonvolatile processor based on ferroelectric flip-flops. These flipflops are used in a distributed fashion and are able to maintain system states without any power supply indefinitely. An efficient controller is employed to achieve parallel reads and writes to the flip-flops. A reconfigurable voltage detection system failures. Measurement results show that this nonvolatile processor can operate continuously even under power failures occurring at 20 KHz. It can backup system states within  $7 \mu s$  and restore them within  $3 \mu s$ . Such capabilities will provide a new level of support to chip-level fine-grained power management and energy harvesting applications.

## I. INTRODUCTION

In embedded or mobile applications, low power processors can be shut down to eliminate leakage power. However, the states in volatile registers and local memories will be lost or have to be backed up into centralized nonvolatile memories (NVM) [1], [2]. The backup process introduces nontrivial power and performance overheads when writing or reading register states into or from NVM during the sleep or wake-up stage.

To reduce power and performance overheads experienced by centralized NVM architectures, the concept of distributed nonvolatile processor (NVP) is proposed [3]–[6]. An NVP introduces a set of distributed nonvolatile flip-flops (NVFF) used to backup regular register contents. By storing the system state in local NVFFs, the state can be stored and recalled in parallel within several cycles. In theory, an NVP can backup the data in every NVFF in parallel so it has the potential to reduce the sleep and wake-up time to several nanoseconds, which provides the capability of "instant" sleep and wakeup. Furthermore, the local data transferring can reduce power consumption between power on and off which provides NVP with high resilience to power failure and high flexibility for power management. Therefore, NVP can help to support more efficient fine-grained power management and energy harvesting systems.

Recently, Rohm first implemented a nonvolatile counter based on ferroelectric flip-flops [3]. Wang et.al. [4] introduced a nonvolatile processor (NVP) with ferroelectric flip-flops using a compare-and-write policy. Afterwards, Yu et.al. [5] proposed a floating gate based NVP. Furthermore, reference [6] presented a compare-and-compress architecture to reduce the NVP's area. To our best knowledge, none of a silicon proved NVP had been reported before. Specifically, we make the following contributions in this paper:

- 1) We present the first fabricated nonvolatile processor with zero standby power,  $7\mu s$  sleep time and  $3\mu s$  wake-up time. The low switching overhead of the NVP attributes to careful design of the FF controller and the voltage decision system.
- 2) We demonstrates that the NVP can achieve over 30-100x speedup on the wake-up/sleep time and over 70x energy savings on the data backup and recall operations when compared with an existing industry processor. Meanwhile the NVP exhibits comparative performance and power consumption in normal operations.

The rest of this paper is organized as follows. Section II discusses the overall architecture of the nonvolatile processor and the voltage detection unit. We then describe the circuits of ferroelectric flip-flops (FeFF) and the nonvolatile controller in Section III. The chip evaluation results are presented in Section IV. We conclude the paper in Section V.

## II. ARCHITECTURE

This section presents the overall NVP architecture and illustrates a configurable voltage detection system, which signals the NVP to backup the system state automatically under power failures.

#### A. Overall NVP Architecture

Fig. 1 shows the block diagram of the entire NVP. It contains a MCS51 instruction set compatible core, which consists of an MCU controller, an 8-bit arithmetic logic unit (ALU), several timers and counters, a serial interface, a 128-Byte register file and an 8K-Byte static random access memory (SRAM). There are two peripherals, SPI and I2C modules, connected to the core via a Wishbone bus. A "JTAG" module is added to support online debug via scan chains. The "Mode" module is used to control the operating mode of the processors, including the volatile mode, the nonvolatile mode and the debug mode.

In this chip, we replace all flip-flops in the core and the 128-Byte register file with nonvolatile ferroelectric devices. They are denoted as the nonvolatile logic domain in Figure 1. The SRAM is used as a secondary data memory and it doesn't store the system state, so we don't replace it to save area. However, it can be replaced with ferroelectric RAM (FeRAM) in future work. Similarly we don't need to keep the states of

This work was supported in part by the NSFC under grant 60976032, National Science and Technology Major Project under contract 2010ZX03006-003-01, International Cooperation from ROHM Inc. and High-Tech Research and Development (863) Program under contract 2009AA01Z130.



Fig. 1. Overall architecture of nonvolatile processor

remote peripherals SPI and I2C, so they are in the volatile logic domain. The flip-flop controller (FFC) is used to generate controlling signals to both FeFFs and volatile flip-flops (VFFs) after receiving the sleep or wake-up signals. We will describe its mechanism in Section III.

## B. Configurable Voltage Detection System

In this subsection we will discuss how to generate the sleep or wake-up signal for the NVP to backup its system state when power failures happen. An configurable voltage detection system (CVDS) is proposed to generate those two signals. The CVDS architecture is shown in Fig. 2(a), which contains two configurable units. The first one is a switched capacitor array attached to the power line. The configurable capacitor, denoted as  $C_{PL}$ , provides the data backup energy to NVP by keeping the voltage above the operating threshold after the power is cut down. The other one, denoted as  $C_{VD}$ , is another switched capacitor array used in the voltage detection circuit. The voltage detection circuit could detect the power failure and regain. It would generate the "Sleep/Wake-up" signal (0 denotes sleep, 1 denotes wake-up) at a certain time  $(T_{plh})$  after the "VDD" passes the operating threshold. The value of  $T_{plh}$  is determined considering the tradeoff between system reliability and backup speed. The control words for those switched capacitor arrays are given by external input switches, which is maintained by users. Fig. 2(b) shows the waveform during the sleep and wake-up actions and the timing diagram influenced by  $C_{PL}$  and  $C_{VD}$ . Measurement results on the CVDS architecture are given in section IV-B.

#### **III. CIRCUIT DESCRIPTION**

We will describe two major circuit structures of NVP different from conventional processor: the ferroelectric flip-flop and the flip-flop controller.

## A. Ferroelectric Flip-flop

Flip-flops could impact the NVP's performance, power and reliability, therefore they should designed carefully. The FeFF used in this chip is shown in Fig. 3(a) [6]. It adopts a hybrid CMOS and ferroelectric technology, consisting of a standard master-slave D flip-flop (DFF) and a backup ferroelectric capacitors (FeCap). The detailed working mechanism of this



Fig. 3. Modeling of ferroelectric flip-flop

FeFF is described in the work [4] In this circuit, the differential architecture help to improve the reliability and performance. The circuits are simulated by HSPICE, and the waveform of this circuit during a store and restore operation is shown Fig. 3(b).

## B. Flip-flop Controller

The flip-flop controller (FFC) is used to generate sequential controlling signals to FeFF in sleep and wake-up actions. Fig. 4(a) shows the block diagram of FFC. It is composed of a timing block and a signal generating finite state-machine (FSM). The timing block is self-timed by the inverter chain and the three timers provide overflow signals (Tov1-Tov3) to the FSM. The FSM generates the controlling signals ("RW","PL","PCH") based on Tov1-Tov3 to meet the sequential requirements shown in Fig. 4(c). The input "Sleep/Wake-up" tells the FSM whether it should be execute sleep sequence or wake-up sequence. The output "CG" is the clock-gating signal to both FeFFs and DFFs. Fig. 4(b) shows the interconnection between FFC and the flip-flops. The "CG" signal gates the clock of both FeFF and volatile FF during the store and recall actions as is shown in Fig. 4(c). It can maintain the data of FeFF during store action to prevent writing uncertainty and hold system stably during recall action to guarantee precise recovery.

## **IV. MEASUREMENTS**

#### A. Test Setup and Performance Overview

The nonvolatile processor, named as **THU1010N**, has been fabricated using the ROHM's  $0.13\mu m$  CMOS-ferroelectric hybrid process. Fig. 5(a) shows its photomicrograph. We evaluate the processor with a suite of embedded benchmarks for sensor networks, such as FFT, FIR and Zigbee MAC protocol. We measured the maximum operating frequency, the average power consumption and the sleep/wake-up metrics with controlled power supplies. We illustrate the general



Fig. 2. Architecture and controlling timing chart of configurable voltage detection system



Fig. 4. Block diagram of flip-flop controller

parameters of THU1010N, as well as some comparison results in Table I.



Fig. 5. Micrograph and general design statistics of THU1010N

Table I compares the NVP chip with a popular industrial processor "MSP430" [1] and an emerging processor based on FeRAM "MSP430FR series" [2]. The sleep and wakeup time of "MSP430" come from the switch time between LPM4.5 mode and active mode (more details in [1], [2]). The result shows that the NVP chip achieves comparative operating parameters during the normal mode. However, it has tremendous advantages in sleep/wake-up time, which shows 100-1000x speedup in the sleep time and 30-100x speedup in the wake-up time. In the chip level, we only need a few microsecond to switch from the power down mode to the active mode. Therefore, we can conclude that the distributed formance than the existing centralized nonvolatile storage. It is quite promising for energy harvesting and power management applications.

nonvolatile architecture of NVP will provide much better per-

TABLE I Overall Properties of Nonvolatile Processor Chip and Comparison Results

| Mic           | roprocessor Type        | THU1010N      | TI-MSP430-5series<br>with Flash | TI-MSP430<br>with FRAM |
|---------------|-------------------------|---------------|---------------------------------|------------------------|
| General       | I/O Pin Number          | 100           | 64 - 80                         | 24 - 40                |
| Statistics    | Capacity of Memory      | 1607-bit FeFF | 2-16KB SRAM                     | 16KB FRAM              |
|               |                         | 8KB SRAM      | 32-128KB FLASH                  | 1KB SRAM               |
|               | Non-volatilizaion       | Register      | Memory                          | Memory                 |
|               | level                   | level         | level                           | level                  |
| Basic         | Maximum Clock Frequency | 25MHz         | 25MHz                           | 24MHz                  |
| Properties    | VDD for Core            | 0.9V-1.5V     | 1.8V-3.6V                       | 2-3.6V                 |
|               | Active Power            | 160µW@1MHz    | 450µW @1MHz                     | 200 µ W @1MHz          |
|               | Standby Power           | 0             | $0.18 \mu A$                    | $0.32 \mu A$           |
| Sleep/Wake-up | Sleep time              | $7 \mu s$     | 6 <i>ms</i>                     | 212µs                  |
| Time          | Wake-up time            | $3\mu s$      | 3ms                             | $310 \mu s$            |

## B. Measurements of Sleep and Wake-up Time

This section will evaluate the sleep and wake-up time of the processor in details. The first part is to show the chiplevel performance, which is critical for the chip-level power gating issue. The other part is to demonstrate the system-level performance, which is related to the voltage detection system.

To measure the chip-level sleep and wake-up time, we use a pattern generator to provide both "VDD" and "Sleep/Wakeup" signals. In the experiments, the NVP executes a counting program under a square wave power supply. By shorten the pulse width of "VDD" and the time interval between "VDD" and "Sleep/Wake-up", we can measure the minimal chip-level sleep and wake-up time. The result shows that the sleep and wake-up time is  $7\mu s$  and  $3\mu s$  under 1.5V power supply. Moreover, we discover that the core voltage may impact the sleep and wake-up speed. Fig. 6 shows the sleep and wakeup time under different voltage supplies. As we can see, the lower voltage would lead to a slower speed. It is because the delays of both FeFF and FFC circuit become larger. However, the sleep and wake-up time can still be smaller than  $20\mu s$  and  $10\mu s$  under a 0.8V supply voltage. Therefore, the physical limitation comes from the devices, such as the FeFFs and FFC circuit. Compared with the several milliseconds backup time in the centralized structure, the microseconds



Fig. 6. NVP sleep and wake-up time under different voltage supply

switching overheads will provides much more power savings opportunities for the fine-grained on-chip power management.

To demonstrate the system-level sleep and wake-up time, we consider the overheads from the power supply as well as the voltage detection. In this experiment, we provide a square wave power supply to the board, and the "Sleep/Wakeup" signal should be automatically generated by the voltage detection circuit. Figure 5(b) shows that it can continuously count under the square wave "VDD" of 20Khz. It implies that the NVP can recover the whole system state, continue to compute then backup the system state within only  $25\mu s$ .

Below, we will evaluate the impacts of capacitors  $C_{PL}$ and  $C_{VD}$  under different power conditions. Fig 7 shows the minimum power-on time under different  $C_{PL}$  and  $C_{VD}$ . The power-on time should be larger than the sum of power stabilization time, voltage detection time and chip intrinsic sleep and wake-up time. The  $C_{PL}$  impacts the "VDD" power stabilization via the RC-constant, so the minimal requested power-on time decreases when  $C_{PL}$  gets lower. However, if the  $C_{PL}$  is smaller than 470nF, it cannot provide sufficient energy for data backup that the system goes to the false region. The  $C_{VD}$  impacts the voltage detection time via its RC-constant, so the requested power-on time decreases when  $C_{VD}$  gets lower. The  $\bar{C}_{VD}$  cannot be smaller than 10pF for the power and clock stabilizing concern. As we can see, the minimal requested power-on time (>  $100\mu s$ ) is much larger than the chip intrinsic sleep and wake-up time (<  $10\mu s$ ) because the most dominating factor in the system-level is the power and clock stability speed, instead of the inner chip circuit speed.

#### C. Measurements of Sleep and Wake-up Energy

The energy consumption for each sleep and wake-up operation is also a key property for the power management. We compare the NVP with two popular data backup methods: 1) off-chip Flash memory; 2)on-chip Flash memory. Table II shows the energy consumption to store and recall 1607 bits data (the FeFF number in the NVP processor) with those three methods. Comparing with other two methods, the proposed distributed architecture can decrease the energy consumption by 19000 times in the data store and 74 times in the data recall. These results conclude that much higher energy efficiency can be obtained if we using nonvolatile registers instead of the centralized nonvolatile memory.

The tremendous advantages in the energy consumption show that the NVP is an excellent candidate for applications



Fig. 7. The minimal power-on time for system accuracy under different capacitance values

| TABLE II                                          |
|---------------------------------------------------|
| DATA STORE AND RELOAD ENERGY CONSUMPTION OF THREE |
| DIFFERENT DATA BACKUP LOCATIONS.                  |

| Data Bakcup Location   | FeFF (NV-registers) | Off-chip Flash | On-chip Flash |
|------------------------|---------------------|----------------|---------------|
| Data store energy(nJ)  | 23.1 nJ             | $2060 \mu J$   | $445 \mu J$   |
| Data recall energy(nJ) | 8.1nJ               | $1.3 \mu J$    | $0.6 \mu J$   |

with frequent power downs, such as wireless sensor network (WSN). Assuming the node will be powered down one time per minute, it will consume 1.12kJ energy for a Flash memory based node during one year lifetime. However, only 17Jenergy is consumed in the NVP solution.

#### V. CONCLUSIONS

Nonvolatile processor implements nonvolatile technology in the register level which brings huge improvements to the backup speed and the energy efficiency. We fabricated a ferroelectric nonvolatile processor and the measured results show that it achieves a  $3\mu s$  sleep and  $7\mu s$  wake-up time in the chip level, as well as several nanowatts energy consumption during those backup operations. Those metrics surpass all the existing solutions by 100-10000 times. Those features are quite promising for fine-grained power management and energy harvesting applications. In future, more works will be done to reduce the sleep/wake-up time to several nanoseconds, which is the physical limitation of the device. Moreover, the voltage detection circuit can be optimized to shorten the response time for unexpected power intervals.

#### VI. ACKNOWLEDGEMENT

Thanks for many helpful suggestions from Prof. Xiaobo Sharon Hu.

#### REFERENCES

- TI, "datasheet of msp430f522x mixed signal microcontrollers," 2009. —, "datasheet of msp430fr573x mixed signal microcontrollers," 2011. Rohm Co., Ltd., "Rohm Demonstrates Nonvolatile CPU," *Website:* [2]
- Rohm Co., Ltd., "Rohm Demonstrates Nonvolatile CPU," Website: http://techon.nikkeibp.co.jp/english/NEWS\_EN/20071004/140206/.
  J. Wang, Y. Liu, H. Yang, and H. Wang, "A compare-and-write ferro-electric nonvolatile flip-flop for energy-harvesting applications," in *Green Circuits and Systems (ICGCS), 2010 International Conference on*. IEEE, pp. 646–650.
  W. Yu, S. Rajwade, S. Wang, B. Lian, G. Suh, and E. Kan, "a non-volatile microcontroller with integrated floating-gate transistors," in *Proceedings of the 5th Workshop on Dependable and Secure Nanocomputing*. ACM Press, 2011, pp. 1–4.
  Y. Wang, Y. Liu, Y. Liu, D. Zhang, S. Li, B. Sai, M.-F. Chiang, and H. Yang, "A compression-based area-efficient recovery architecture for nonvolatile processors," in *Design, Automation Test in Europe Conference Exhibition (DATE), 2012*, march 2012, pp. 1519 –1524.