# Multiple Scan Chains for Power Minimization During Test Application in Sequential Circuits

Nicola Nicolici and Bashir M. Al-Hashimi Electronic Systems Design Group Department of Electronics and Computer Science University of Southampton Southampton SO17 1BJ, UK {nn99r,bmah}@ecs.soton.ac.uk

#### Abstract

This paper presents a new technique for power minimization during test application in sequential circuits using multiple scan chains. The technique is based on a new design for test (DFT) architecture and a novel test application strategy which reduces spurious transitions in the circuit under test. To facilitate the reduction of spurious transitions, the proposed DFT architecture is based on classifying scan latches into compatible, incompatible and independent scan latches. Based on their classification scan latches are partitioned into multiple scan chains and a single extra test vector associated with each scan chain is computed. A new test application strategy which applies the extra test vector to primary inputs while shifting out test responses for each scan chain, minimizes power dissipation by eliminating the spurious transitions which occur in the combinational part of the circuit. The newly introduced multiple scan chain-based technique which relies on extra test vectors and multiple scan chains does not introduce performance degradation and minimizes clock tree power dissipation with minimal impact on both test area and test data overhead. Unlike previous approaches which are test set dependent and hence are not able to handle large circuits due to the complexity of the design space, this paper shows that with low test area and test data overhead substantial savings in power dissipation during test application are achieved in very low computational time for both small and large tests. For example, in the case of benchmark circuit s15850 it takes < 600s in computational time and < 1% in test area and test data overhead to achieve over 80% savings in power dissipation.

## **1** Introduction

Minimization of power dissipation in very large scale integrated (VLSI) circuits is important to improve the reliability and reduce packaging costs [1]. This indicates that future successful portable applications will depend not only on low-power design methods but also on new design for testability (DFT) techniques targeting low-power VLSI circuits. Numerous techniques for investigating power minimization during the normal (functional) mode [2] have been proposed. Also it is important to examine the power dissipation during the testing mode [3,4] mainly for the following two reasons. Firstly it was outlined in [1] that power dissipated during test application is substantially higher than power dissipated during functional operation which can decrease the reliability of the circuit under test due to higher temperature and current density. Secondly the excessive power/ground noise caused by the high rate of current flowing in power and ground lines can erroneously change the logic state of circuit lines causing some good dies to fall the test [5] leading to yield loss. While minimizing power dissipation in full scan sequential circuits is the focus of this paper, in order to provide a meaningful understanding of the novel proposed approach a comprehensive review of sources of higher power dissipation during test application and low power testing techniques is given in sections 1.1 and 1.2. Motivations and objectives of the proposed work are presented in section 1.3.

#### 1.1 Sources of higher power dissipation during test application

Depending on the level of abstraction and circuit type, high power dissipation during test application is due to the following problems:

- i. Systems which comprise modern memory systems and multichip modules (MCMs) employ power-conscious architectural decisions where blocks are not simultaneously activated under functional operation [6]. Hence, inactive blocks do not contribute to power dissipation during the functional operation. However, when the system is in the test mode of operation, concurrent execution of tests in many blocks will result in substantially higher power dissipation when compared to functional operation.
- ii. Low power combinational circuits are synthesized by algorithms [2] which seek to optimize the signal or transition probability of circuit nodes using only the spatial dependencies inside the circuit assuming the transition probabilities of primary inputs to be given. However, the complex spatiotemporal correlations which occur at the primary inputs must be considered [2]. This is of further importance during test application since

correlation between consecutive test vectors generated by an automatic test pattern generator (ATPG) is very low, because a test vector is generated for a given target fault without any consideration of the previous test vector in the test sequence. The low correlation between consecutive test vectors during test application leads to substantially higher power dissipation when compared to functional operation.

iii. Low power sequential circuits are synthesized by state assignment algorithms which use state transition probabilities [2]. The state transition probabilities are computed assuming input probability distribution and state transition graph which is valid during functional operation. These two assumptions are not valid during the test mode of operation when scan DFT technique is employed. While shifting out test responses, the scan latches are assigned uncorrelated values that destroy the correlation between successive states. Furthermore, in the case of data path circuits with large number of states that are synthesized for low power using the correlations between data transfers [2], in the test mode scan registers are assigned uncorrelated values which are never reached during functional operation leading to substantially higher power dissipation.

#### **1.2** Previous work on low power testing

This section gives a comprehensive review of recently proposed solutions for solving **problems** (i) - (iii) of section 1.1.

**Problem (i)**: To overcome the problem of high power dissipation during test application at the system level , numerous power-constrained test scheduling algorithms have been proposed under built-in self-test (BIST) environment [1,6–11]. The approach in [1] schedules the tests under power constraints by grouping and ordering based on floorplan information. A further exploration in the solution space of the scheduling problem is provided in [6] where a resource graph formulation for the test problem is given and tests are scheduled concurrently without exceeding their power ratings during test application. To overcome the identification of all the cliques in a graph and the covering table minimization problem applied in [6], which are well known NP-hard problems, the solution proposed in [7] uses the left edge algorithm and tree growing technique as an heuristic for the block test scheduling problem. Several solutions for scheduling tests under power and area constraints [8–11] have recently been proposed. However, all the previous approaches assume BIST environment which trades off high test area overhead and test application time at the expense of lower power dissipation during testing.

Problem (ii): A new ATPG tool [5] was proposed to overcome the low correlation between consecutive test vectors during test application in combinational circuits. Despite achieving the objectives of safe and inexpensive testing of low power circuits the approach in [5] increased the test application time. A different approach for minimizing power dissipation during test application in combinational circuits (problem ii) is based on test vector ordering [12–15]. Test vector ordering is done in a post-ATPG phase with no overhead in test application time since test vectors are reordered such that correlation between consecutive test vectors matches the assumed transition probabilities of primary inputs used for switching activity computation during low power logic synthesis. However the computational time in [12] is very high due to the complexity of test vector ordering problem which is reduced to finding a minimum cost hamiltonian path in a complete, undirected, and weighted graph. The high computational time is overcome by the techniques proposed in [13–15] where test vector ordering assumes high correlation between switching activity in the circuit under test and the hamming distance [13, 14] or transition density [15] at circuit primary inputs. For combinational circuits employing BIST several techniques for minimizing power dissipation have been proposed recently [16-23]. In [16] the use of dual speed linear feedback shift register (LFSR) lowers the transition density at the circuit inputs leading to minimized power dissipation. Optimal weight sets for input signal distribution are determined in order to minimize average power [17], while the peak power is reduced by finding the best initial conditions in the cellular automata (CA) cells used for pattern generation [18]. It has been proved in [19] that all the primitive polynomial LFSR of the same size, produce the same power dissipation in the circuit under test, thus advising to use the LFSR with smaller number of XOR gates since it yields lowest power dissipation by itself. A mixed solution based on reseeding LFSRs and test vector inhibiting to filter few non-detecting subsequences of a pseudorandom test sequence has been proposed in [20]. An enhancement of test vector inhibiting technique has been proposed in [21] where all the non-detecting subsequences are filtered. A different approach for filtering non-detecting vectors inspired by the precomputation architecture is presented in [22]. An improvement in area overhead associated with filtering non-detecting vectors without penalty in fault coverage or test length has been achieved using non-linear hybrid cellular automata [23]. Regardless of the type of test pattern generator, BIST architectures significantly differ from one another in terms of power dissipation as outlined in [24]. Thus, circuit partitioning for low power BIST and test session planning have an important influence on power dissipation as shown in [25]. Regularity of multiplier modules and linear sized test set required to achieve high fault coverage lead to efficient low power BIST implementations for data paths [26]. Although the techniques proposed for minimizing power dissipation during test application in combinational circuits achieve good results, different approaches are required for sequential circuits where both DFT methodology and test application strategy have a strong impact on power dissipation.

Problem (iii): To minimize power dissipation in non scan sequential circuits during test application a test pattern generation methodology for low power dissipation has been proposed in [27]. The methodology is based on three independent steps comprising redundant test pattern generation, power dissipation measurement and optimal test sequence selection. The methodology which is based on genetic algorithms achieves considerable savings in power dissipation, however cannot be applied to scan sequential circuits where shifting power dissipation is the major contributor to total power dissipation. To minimize shifting power dissipation in scan sequential circuits, test vector inhibiting techniques proposed for combinational circuits are extended to scan sequential circuits [28]. In [29] the test vector inhibiting technique is extended where the modules and modes with the highest power dissipation are identified, and gating logic to reduce power dissipation has been introduced. Despite substantial savings in power dissipation vector detection and gating logic introduce not only significant area overhead but also considerable performance degradation for modified scan cell design. In [30] a new scan BIST structure has been proposed based on the experimental observation that a very high fault coverage can be obtained by a small number of clusters of test vectors. Although not targeted specifically for low power dissipation during test application the approach in [30], yields high fault coverage with correlated scan patterns which will also lead to lower power dissipation. A similar approach is employed in the low transition random test pattern generator (LT-RTPG) proposed in [31], where neighbouring bits of the test vectors are assigned identical values in most test vectors. A simple and fast procedure to compact scan vectors as much as possible without exceeding power dissipation has been proposed in [32]. All the previous scan-based BIST techniques [28-32] introduce test area overhead and/or further performance degradation when compared to scan DFT methodology. A different technique [12] based on test vector and scan latch ordering minimizes power dissipation in full scan sequential circuits without any overhead in test area or performance degradation. Further benefit of the post-ATPG technique proposed in [12] is that minimization of power dissipation during test application is achieved without any decrease in fault coverage and/or increase in test application time. However, the technique is test set dependent and cannot significantly reduce power dissipation despite a large computational time required to explore the large design space. Furthermore, for circuits with large number of scan latches the technique proposed in [12] is infeasible since computational time required to compute the cost function of each solution in the large design space, is unacceptably large. A further enhancement of the technique proposed in [12] can be achieved by defining novel test application strategies since the value of primary inputs is irrelevant while shifting out test responses. Hence, an improvement to scan latch and test vector ordering based on primary input freezing has been proposed in [33]. The approach does not introduce area overhead or further performance degradation, however it requires high computational times for large circuits. A different approach to achieve power savings is the use of extra primary input test vectors and hence supplementary volume of test data [34, 35]. The technique proposed in [34] exploits the redundant information that occurs during scan shifting, test application and response capture to minimize switching activity in the circuit under test. Despite achieving considerable power savings the technique requires long test application time and large volume of test data. The volume of test data is reduced in [35] where a D-algorithm like pattern generator [36] is developed to generate a single control pattern to mask the circuit activity while shifting out response. The input control technique proposed in [35] can further be combined with previously proposed scan latch and test vector ordering [12] to achieve, however, modest savings in power dissipation. Moreover, both approaches based on extra test vectors [34, 35] require high computational time and hence are infeasible for large sequential circuits.

#### **1.3** Motivation and objectives

The aim of this paper is to reduce power dissipation in scan sequential circuit (problem iii). Despite their benefits in lowering power dissipation during test application, the previously described techniques [12, 28–35] are inefficient due to one or more of the following problems:

- a. test area overhead associated with detection logic [28, 29] required to find non-essential vectors (i.e. vectors which do not contribute to an increase in fault coverage).
- b. performance degradation associated with modified scan cell design [29].
- c. large test application time required to achieve significant power savings [29-32, 34].
- clock tree power dissipation is tackled by clock gating only for nonessential test vectors
  [29].
- e. high number of extra test vectors [34] emerges as a problem to testers which need to change to support the large volume of test data [37].
- f. computational time may be prohibitively large hindering the exploration for large sequential circuits [12, 33–35].

The previous techniques [12, 28–35] proposed separate solutions for solving one of the problems (a) - (f) at the expense of the other problems. For example while test vector inhibiting techniques [28, 29] achieve good savings in power dissipation, considerable area overhead for detection logic is introduced (problem a) or further performance degradation is incurred (problem b). On the other hand techniques based on adjacent patterns [30–32] require considerable test application time (problem c). Furthermore, clock tree power dissipation (problem d) which can be up to one third of total power dissipation [38] is tackled only in [29] where the clock is gated only for non-essential test vectors. This implies that for essential vectors there are no savings in clock tree power dissipation. The technique proposed in [34] necessitates an increase of  $(m \times p)/(m + p)$  in the volume of test data where m is the number of scan latches and p is the number of primary inputs. While volume of test data (problem e) was not a concern in the past for small to medium sized circuits it is recently emerging as a problem for testers which need to change to support the large volume of test data [37]. The technique proposed [35] overcomes the problem with large volume of test data by computing a single extra vector. However, it yields modest savings in power dissipation due to inability to fully mask the activity in the combinational part of the circuit. Furthermore, to achieve good fault coverage both techniques based on extra vectors [34, 35] require longer test sequences and hence both higher test application time (problem c) and computational time (problem f). Finally techniques which operate in a post-ATPG phase [12, 33] using compact test sets for high fault coverage require huge computational time (problem f) since they are strongly test set dependent and require probabilistic optimization.

The aim of this paper is to introduce a new technique for power minimization during test application in full scan sequential circuits based on a novel DFT architecture which eliminates all the above mentioned problems (a) - (f). The proposed DFT architecture is based on partitioning scan latches into multiple scan chains which reduces the clock tree power dissipation and does not have performance penalty. A new test application strategy for the proposed DFT architecture which applies a single extra test vector while shifting out test responses for each scan chain is presented. The multiple scan chain-based approach for power minimization which is test set independent, is applicable to both non-compact and compact test sets leading to low test application time. This paper shows that with low test area and test data overhead high savings in power dissipation during test application in large full scan sequential circuits are achieved in low computational time.

### **2** Background and Definitions

In the following, a brief review of the standard test terminology and power dissipation concepts which will be used throughout the paper are presented.

The *controlling* value for a gate is a single input value that uniquely determines the output to a known value independent of the other inputs to the gate. For example, the controlling value for OR gate is 1, and for AND gate is 0. If the value of an input is the complement of the controlling value, then the input has a *noncontrolling* value. A path is a set of connected gates and wires. A path is defined by a single input wire and a single output wire per gate. A signal is an *on-input* if it is on the target path. A signal is an *off-input (side input)* if it is an input to a gate which is on a target path but is not an on-input. If two faults can be detected by a single test vector, they are called *compatible faults*. Consequently, two faults are called *incompatible faults*, if they cannot be detected by a single test vector. A test vector from a given test set is called an *essential* test vector is *non-essential* with respect to a given test set if all the faults detected by it are also detected by other test vectors in the given test set. A *test set dependent* approach for power minimization is dependent on the size and type of the test set.

Power dissipation in digital CMOS circuits is divided into static and dynamic power. The static power is considered negligible when compared to the dynamic power in digital CMOS circuits [39]. If the gate is part of a synchronous digital circuit controlled by global clock, it follows that the dynamic power dissipation  $P_d$  is calculated using:

$$P_d = 0.5 \times C_{load} \times (V_{DD}^2/T_{cyc}) \times N_G \tag{1}$$

where  $C_{load}$  is the load capacitance,  $V_{DD}$  is the supply voltage,  $T_{cyc}$  is the global clock cycle, and  $N_G$  is the total number of gate output transitions  $(0 \rightarrow 1 \text{ or } 1 \rightarrow 0)$ . Since supply voltage  $V_{DD}$  and global clock cycle  $T_{cyc}$  are design constraints, they are not under designer control. Thus, *node transition count* 

$$NTC = \sum_{for \ all \ gates \ G} N_G \times C_{load}$$
(2)

is reported as quantitative measure for power dissipation throughout the paper. It has been assumed that load capacitance for each combinational gate is equal to the number of fan-outs. The node transition count in scan latches  $N_{SL}$  is considered as in [12], where it was shown that for input changes  $0 \rightarrow 0$  and  $1 \rightarrow 1$ ,  $N_{SL_{min}} = 2$ , whilst for input changes  $0 \rightarrow 1$  and  $1 \rightarrow 0$ ,  $N_{SL_{max}} = 6$ .

## **3** Power Minimization in Full Scan Sequential Circuits Based on Multiple Scan Chains

In this section a new technique for power minimization in full scan sequential circuits based on multiple scan chains is introduced. Section 3.1 overviews the proposed design for testability (DFT) architecture for power minimization. Section 3.2 defines compatible, incompatible and independent scan latches and their importance for partitioning scan latches into multiple scan chains, as described in section 4, is explained through examples. Interestingly, although a previous approach [34] used the term "independent", they actually classified primary inputs as independent and not scan latches as it is the case in section 3.2. Therefore, there is no similarity between the previous approach [34] and the proposed classification beyond accidental sameness of terminology. Finally section 3.3 gives an important theoretical result showing the advantage of the proposed DFT architecture from the clock tree power dissipation standpoint.

#### 3.1 Proposed Design for Testability Architecture Using Multiple Scan Chains

The proposed DFT architecture using multiple scan chains  $SC_0 \dots SC_{k-1}$  is illustrated in Figure 1. The scan input ScanIn is routed to all scan chains while the scan output ScanOut is selected from the output of each scan chain. Scan chains  $SC_0 \dots SC_{k-1}$  are operated using nonoverlapping clock signals  $CLK_0...CLK_{k-1}$ . Nonoverlapping clock signals gate the system clock CLK using a scan control register which has the number of latches equal to the number of scan chains. While shifting out test responses through scan chain  $SC_i$ , only the bit position *i* of scan control register is set to 1 while the other positions are set 0. This is easily implemented by shifting the value of 1 through scan control register using the extra scan clock SCLK. Before starting the first scan cycle, the initial vector 10...00 is set up in the scan control register using the scan input ScanIn. Thereafter, for each scan cycle, the 10...00 value is propagated circularly through the scan control register as shown in Figure 1. It should be noted that when the circuit under test is in the test mode all the faults in the extra logic are observable through ScanOut line using the test data which is shifted through the k scan chains and control data shifted through the scan control register. Therefore the extra test hardware does not reduce any decrease in fault coverage. During the normal operation of the circuit  $CLK_0 \dots CLK_{k-1}$  are active at the same time since when normal/test signal N/T is 1 the output of extra OR gates is 1 and the system clock CLK is not gated by the scan control register. To provide a brief overview of the test application strategy for the proposed DFT architecture, while shifting out test responses present in scan latches from scan chain  $SC_i$ , primary inputs are set to extra test vector  $EV_i$  which eliminates the spurious transitions (Definition 1 from section 3.2) that originate from scan latches from scan chain  $SC_i$ . Note that the proposed DFT architecture does not introduce performance degradation since extra test hardware is not inserted on critical paths. Further, the extra test hardware required by the scan control register and selection logic can be specified at the gate level and synthesized with the rest of the circuit which makes the proposed DFT architecture easily embeddable in the existing VLSI design flow. The algorithm for partitioning scan latches into multiple scan chains is described in section 4.1, while the new test application strategy using multiple scan chains and extra test vectors is described later in section 4.2. Before describing generation of multiple scan chains, scan latches need to be classified into three broad classes as described in the following section.



Figure 1: Proposed DFT architecture based on multiple scan chains.

#### **3.2** Compatible, Incompatible and Independent Scan Latches

In order to partition scan latches into multiple scan chains, they need to be classified into three broad classes: compatible, incompatible and independent scan latches. It should be noted that scan latch classification is not done explicitly by enumeration or exhaustive search, but it is done implicitly by the partitioning algorithm as explained later in Figure 7 of section 4.1. The proposed classification is also important for computing extra test vectors associated with each scan chain that eliminate spurious transitions which are defined as follows.

**Definition 1** A spurious transition during test application in scan sequential circuits is a transition which occurs in the combinational part of the circuit under test while shifting out the test response and shifting in the present state part of the next test vector. These transitions do not have any influence on test efficiency since the values at the input and output of the combinational part are not useful test data.

Having defined the spurious transitions, now the compatible and incompatible scan latches are introduced.

**Definition 2** Two scan latches  $S_i$  and  $S_j$  are *compatible* if all primary inputs  $x_k$  are assigned values  $c_k$  that eliminate the spurious transitions which originate from both  $S_i$  and  $S_j$ . The values  $c_k$  of primary inputs  $x_k$  constitute the *extra test vector* which eliminates spurious transitions originating from both  $S_i$  and  $S_j$ .

Note that the sole purpose of extra test vectors is to reduce the spurious transitions during test application and has no effect on fault coverage which is determined by the original test set. The application of extra test vectors defines a novel test application strategy for power minimization which is detailed in section 4.2. Further, since a single extra test vector is used for each scan chain regardless of values loaded in scan latches then the volume of extra test data is dependent only on the number of scan chains and not on the number of scan latches and/or the size of the original test set.

**Definition 3** Two scan latches  $S_i$  and  $S_j$  are *incompatible* if at least one primary input  $x_k$  that is assigned value  $i_k$  to eliminate the spurious transitions which originate from  $S_i$  will propagate the transitions which originate from  $S_j$ . Two incompatible scan latches cannot be assigned to the same scan chain since there is no extra test vector that can eliminate spurious transitions which originate from both of them.



Figure 2: Example circuit illustrating compatible and incompatible scan latches.

The following example illustrates compatible and incompatible scan latches.

**Example 1** Consider the simple circuit of Figure 2. The  $\{x_0, x_1, x_2\}$  are primary inputs,  $\{S_0, S_1, S_2, S_3, S_4, S_5\}$  are scan latches,  $\{y_0, y_1, y_2, y_3, y_4, y_5\}$  are present state lines, and  $\{z_0, z_1, z_2, z_3, z_4, z_5\}$  are circuit outputs. To eliminate spurious transitions at gate  $z_0$  while shifting out test responses through scan latch  $S_0$ , primary input  $x_0$  must be assigned the controlling value 0 of gate  $z_0$ . Similarly, to eliminate spurious transitions that originate from scan latch  $S_1$ , primary input  $x_0$  must be assigned the controlling value 1 of gate  $z_1$ . Different values must be assigned to  $x_0$  to eliminate spurios transitions which originate from scan latches  $S_0$  and  $S_1$ . Therefore scan latches  $S_0$  and  $S_1$  are incompatible and are assigned to different scan chains  $SC_0 = \{S_0\}$  and  $SC_1 = \{S_1\}$ . On the other hand, by assigning  $x_1$  to the controlling value 0 of gates  $z_2$  and  $z_3$  the spurious transitions which originate from both scan latches  $S_2$  and  $S_3$  are eliminated. Thus, by introducing  $S_2$  and  $S_3$  into  $SC_0$  and applying for example extra test vector  $x_0x_1x_2 = \{000\}$  while shifting out test responses from  $SC_0 = \{S_0, S_2, S_3\}$  no spurious transitions will occur at gates  $z_0$ ,  $z_2$  and  $z_3$ . Similarly, scan latches  $S_4$  and  $S_5$  are compatible since assigning 1 to the primary input  $x_2$  eliminates spurious transitions at gates  $z_4$  and  $z_5$ . By introducing  $S_4$  and  $S_5$  into  $SC_1$  and applying extra test vector  $x_0x_1x_2 = \{111\}$  while shifting out test responses from

 $SC_1 = \{S_1, S_4, S_5\}$  no spurious transitions will occur at gates  $z_1$ ,  $z_4$  and  $z_5$ . It should be noted that there is a strict interrelation between extra test vector value  $x_0x_1x_2 = \{000\}$  and scan chain  $SC_0 = \{S_0, S_2, S_3\}$ , and  $x_0x_1x_2 = \{111\}$  and scan chain  $SC_1 = \{S_1, S_4, S_5\}$ . While for the sake of simplicity, the extra test vectors  $x_0x_1x_2 = \{000\}$  and  $x_0x_1x_2 = \{111\}$  have been described explicitly in this particular example, the extra test vectors and hence the multiple scan chains are derived implicitly using a reduced circuit, specified fault list and ATPG tool as described later in the algorithms of section 4.1. Finally, note that output signals  $z_3$  of scan chain  $SC_0$  and  $z_5$  of  $SC_1$  are fed into the selection logic of the proposed DFT architecture from Figure 1.

The previous example has assumed a simple circuit where *all* the spurious transitions are eliminated by partitioning scan latches in two scan chains  $SC_0$  and  $SC_1$ . However, some of the spurious transitions cannot be eliminated as described in the following example.



Figure 3: Example circuit illustrating spurious transitions which cannot be eliminated.

**Example 2** Consider the circuit shown in Figure 3. The spurious transitions which originate in scan latches  $S_0$  and  $S_1$  cannot be eliminated at gate  $t_0$  since both inputs are present state lines. However, by assigning  $x_0$  and/or  $x_1$  to the controlling value 0 of gate  $t_1$  the spurious transitions will be eliminated at gate  $t_1$ . Scan latches  $S_0$  and  $S_1$  are compatible since same primary input values eliminate the spurious transitions of gate  $t_1$ .

Example 2 has illustrated that some of the spurious transitions cannot be eliminated since all the gate inputs depend on present state lines. Computing primary input values that eliminate spurious transitions (extra test vectors introduced in Definition 2) can be viewed as an ATPG problem to a *reduced circuit* with a *specified fault list* which are detailed in the algorithms presented in section 4.1. The following example briefly illustrates the generation of the *reduced circuit* required to compute extra test vectors.



Figure 4: Reduced circuit of the example circuit from Figure 3 illustrating the steps required to compute extra test vectors.

**Example 3** For the circuit shown in Figure 3 the reduced circuit is generated as follows. Initially the signal  $t_1$  at the input of gate  $z_0$  is identified to eliminate spurious transitions that originate from scan latches  $S_0$  and  $S_1$ . Then scan latches  $S_0$  and  $S_1$ , and the AND gate  $t_0$  are excluded from the reduced circuit as shown in Figure 4. Furthermore, gate  $z_0$  is modified to a buffer (signals  $t_1$  and  $z_0$  are identical). The targeted fault in the reduced circuit is  $t_1$  sa – 1 which eliminates the spurious transitions at gate  $z_0$  in the original circuit. Finally, the extra test vectors (Definition 2) that eliminate the spurious transitions during test application are computed  $x_0x_1 = \{0X, X0\}$ .

A particular case of spurious transitions which cannot be eliminated using a single extra test vector are those that originate in self-incompatible scan latches and are defined as follows.

**Definition 4** A scan latch  $S_i$  is *self-incompatible* if at least one primary input  $x_k$  that is assigned value  $i_k$  to eliminate the spurious transitions which originate from  $S_i$  on one fanout path will propagate the transitions which originate from  $S_i$  on a different fanout path.

Now a new question which arises is whether the spurious transitions which originate from selfincompatible scan latches can be eliminated? In order to provide an answer consider the following example.

**Example 4** Consider the circuit of Figure 5 where  $\{x_0, x_1\}$  are primary inputs,  $S_0$  is scan latch,  $y_0$  is present state line, and  $\{t_0, t_1, t_2\}$  are circuit lines. To eliminate spurious transitions at gate  $t_0$  while shifting out test responses through scan latch  $S_0$ , primary input  $x_0$  must be assigned the controlling value 1 of gate  $t_0$ . However, to eliminate spurious at gate  $t_1$ , primary input  $x_0$  must be assigned the controlling value 0 of gate  $t_1$ . Different values must be assigned to  $x_0$  to eliminate spurios transitions which originate from the same scan latch  $S_0$  and hence scan latch  $S_0$  is



Figure 5: Example circuit illustrating self-incompatible scan latches.

self-incompatible. However if primary input  $x_1$  is assigned the controlling value 0 of gate  $t_2$  the spurious transitions which originate in  $S_0$  and propagate on path  $\{S_0, t_1, t_2\}$  will be eliminated. Therefore by assigning extra test vector  $x_0x_1 = \{10\}$  spurious transitions propagating on both paths  $\{S_0, t_0\}$  and  $\{S_0, t_1, t_2\}$  will be eliminated. This leads to the conclusion that most of the spurious transitions originating in self-incompatible scan latches can be eliminated by examining the fanout paths of self-incompatible scan latches and assigning a single extra test vector while shifting out the test responses. However, the single extra test vector is at the expense of a small number of spurious transitions that cannot be eliminated as in the case of transitions on line  $t_1$  in the simple circuit of Figure 5.

The previous example has shown that following a careful examination of fanout branches of self-incompatible scan latches, most of the spurious transitions originating in self-incompatible scan latches can be eliminated using a single value for the extra test vector. Finally, independent scan latches are introduced.

**Definition 5** A scan latches  $S_i$  is *independent* if all the gates on all the paths which originate from  $S_i$  do not have at least one side input which can be justified by primary inputs.

The independent scan latches are grouped in the extra scan chain (ESC) for which no extra test vector can be computed and hence the spurious transitions cannot eliminated. The following example illustrates independent scan latches.

**Example 5** Consider the circuit shown in Figure 6. Output  $z_0$  depends only on scan latches  $S_0$  and  $S_1$ , and the next state  $Y_4$  of scan latch  $S_4$  depends only on scan latches  $S_0$ ,  $S_1$ ,  $S_2$  and  $S_3$ . There are no side inputs of gates  $t_0$  and  $t_1$  that can be justified by primary inputs such that spurious transitions originated from  $S_0$ ,  $S_1$ ,  $S_2$  and  $S_3$  are eliminated. Therefore scan latches  $S_0$ ,  $S_1$ ,  $S_2$  and  $S_3$  are independent.



Figure 6: Example circuit illustrating independent scan latches.

#### **3.3** Power dissipated by the buffered clock tree

Previous research has established that power dissipated in the clock tree is typically one third of the total power dissipation [38] and hence it is necessary to minimize power dissipated in the clock tree not only during functional operation but also during test application. Unlike previous approaches which do not consider power dissipated by the buffered clock tree [12, 28, 30–35] or gate the clock tree *only* for non-essential test vectors from a large test set [29] the proposed DFT architecture using multiple scan chains (Figure 1 from section 3.1) reduces clock tree power for *all* the test vectors of a very small test set where each test vector is essential (i.e. detects at least one fault). This is explained by the following theorem which gives an upper bound on power reduction.

**Theorem 1** Consired *k* scan chains in the design for testability architecture of Figure 1 then the power reduction of the buffered clock tree over the standard full scan architecture is upper bounded by (k-1)/k.

**Proof:** Let  $\{m_0, \ldots, m_{k-1}\}$  be the size of each scan chain and  $\sum_{i=0}^{k-1} m_i = m$ , where *m* is the total number of scan latches. Since for large dies the clock power dissipation transitions from square-root dependence on the number of scan latches to a linear dependence [38] power dissipated by each scan chain  $SC_i$  can be approximated to  $\lambda \times m_i$  where  $\lambda$  is dependent on clock frequency, supply voltage and wire lengths. The power dissipated while shifting test responses over an entire scan cycle (*m* clock cycles) for the proposed architecture is  $P_{MSC} = \lambda \times \sum_{i=0}^{k-1} m_i^2$  since over  $m_i$  clock cycles only the buffered clock tree feeding  $SC_i$  is active. On the other hand power

dissipated in the traditional full scan architecture is  $P_{FULL} = \lambda \times m^2 = \lambda \times (\sum_{i=0}^{k-1} m_i) \times (\sum_{i=0}^{k-1} m_i)$ . Therefore the reduction in power dissipation is

$$Red = (P_{FULL} - P_{MSC}) / P_{FULL} = 1 - (\lambda \times \sum_{i=0}^{k-1} m_i^2) / (\lambda \times (\sum_{i=0}^{k-1} m_i) \times (\sum_{i=0}^{k-1} m_i)).$$

Following Cauchy-Schwarz inequality [40] where

$$(\sum_{i=0}^{k-1} m_i) \times (\sum_{i=0}^{k-1} m_i) \le k \times (\sum_{i=0}^{k-1} m_i^2)$$

 $\Box$ .

the power reduction is upper bounded by  $Red \le 1 - 1/k = (k - 1)/k$ 

The previous theorem shows that power reduction of up to (k-1)/k can be achieved in the buffered clock tree, with maximum reduction achieved when scan chains have an equal number of scan latches. It should be noted that by gating the clock of each scan chain not only average power reduction is achieved but also savings in peak power are guaranteed since while shifting out test responses only a single buffered clock tree is active.

## 4 Multiple Scan Chains Generation and New Test Application Strategy

In this section, partitioning of scan latches in multiple scan chains based on their classification, as described in 3.2, is given. Then, a new test application strategy for power minimization during test application, based on the DFT architecture described in section 3.1, is introduced.

#### 4.1 Partitioning Scan Latches into Multiple Scan Chains

Multiple Scan Chain Partitioning (MSC-PARTITIONING) algorithm identifies compatible scan latches introduced by Definition 2 of section 3.2, groups them in scan chains and computes an extra test vector for each scan chain. Figure 7 gives the flow of the proposed MSC-PARTITIONING algorithm which is divided in five parts identified in boxes marked from (a) to (e). In order to facilitate the elimination of spurious transitions by computing an extra test vector for each scan chain the initial circuit C needs to be transformed to a reduced circuit C' (box (a)). A byproduct of the reduction procedure is a specified fault list  $\mathbf{L}$  (box (b)) which is targeted by an automatic test pattern generation (ATPG) process on the reduced circuit C' (box (c)). Associated with each fault  $FS_i$  sa –  $nc_i$  in the specified fault list L is a set of scan latches whose spurious transitions will be eliminated in the original circuit C by applying extra test vector  $EV_i$  which detects  $FS_i$  sa –  $nc_i$  in the reduced circuit C'. Therefore based on fault compatibility in the reduced circuit C', scan latch classification in the original circuit C is done implicitly which leads to several partitions of the initial single scan chain (box (d)). However, some scan latches may be self-incompatible (Definition 4) which leads to iterations through the ATPG process with a respecified fault list (box (e)) until no self-incompatible scan latches are left. At the end of the *MSC-PARTITIONING* algorithm the multiple scan chains  $\{SC_0, \ldots, SC_{k-1}, ESC\}$  and extra test  $\{EV_0,\ldots,EV_{k-1}\}$  will be used by the novel test application strategy described in section 4.2. In the following each part of the MSC-PARTITIONING algorithm is explained in detail.

a. In the first part of the *MSC-PARTITIONING* of Figure 7 the initial circuit C is transformed into a reduced circuit C' as described in *CIRCUIT-REDUCTION* algorithm of Figure 8. The algorithm also identifies the *freezing signals* which are the signals that depend on primary inputs and should be set to the controlling value as side inputs to the gates which eliminate transitions that originate from scan latches as described in the following parts. Two lists of *eliminated\_gates* and *modified\_gates* contain the gates which ought to be eliminated and modified respectively in the reduced circuit C'. Initially *eliminated\_gates* is set to all the scan latches whereas the *modified\_gates* is void (lines 1-2). The circuit



Figure 7: Proposed algorithm for partitioning scan latches in multiple scan chains.

is traversed in breadth first search order using two lists *current\_frontier* and *new\_frontier*. While *current\_frontier* is set initially to all the scan latches of **C** (line 3), the *new\_frontier* initially is void (line 4). In the inner loop (lines 6-13) for all the gates neighbours of the current frontier it is checked where input gates already belong to *eliminated\_gates* (i.e. depend on scan latches). If this is the case then the currently evaluated gate is introduced into *eliminated\_gates*, removed from *modified\_gates* (if applicable) and introduced to *new\_frontier*. If at least one input does not belong to *eliminated\_gates* then the currently evaluated gate is introduced to *modified\_gates*. In the outer loop (lines 5-16) while cur-

rent frontier is not void (i.e. no more gates need to be eliminated) the inner loop proceeds further. At the end of each iteration of the outer loop *current\_frontier* and *new\_frontier* are updated (lines 14 and 15). Finally, using the *eliminated\_gates* and *modified\_gates* the initial circuit C is modified to the reduced circuit C' (lines 16 and 17) as follows: gates that belong to *eliminated\_gates* (depend only on scan latches) are excluded; gates that belong to *modified\_gates* (depend on both scan latches and primary inputs) are modified to gates with input signals dependent only on primary inputs (in the case of gates with two inputs of which one is a freezing signal the gate is modified to a buffer); all the freezing signals identified in the first step are set as primary outputs in the reduced circuit. Freezing signals  $\{FS_0, \ldots, FS_{p-1}\}$ , which are the outputs of the gates present in the *modified\_gates*, are determined simultaneously with identifying independent scan latches (Definition 5). The independent scan latches are grouped into the extra scan chain (ESC) which consists of scan latches whose spurious transitions cannot be eliminated by computing an extra test vector. The algorithm returns not only the reduced circuit C' but also the list of freezing signals that will be used in the following part of the MSC-PARTITIONING of Figure 7.

- b. In the second part a specified fault list **L** is created which will be provided together with the reduced circuit **C'** to an automatic test pattern generation (ATPG) tool. Specified fault list **L** comprises freezing signals  $FS_i$  targeting the stuck at the non controlling value  $sa - nc_i$  of the gate  $g_i$  from *modified\_gates* list of algorithm *CIRCUIT-REDUCTION* of Figure 8. It is important to note that each fault  $FS_i$   $sa - nc_i$  has attached a list of scan latches  $\{S_{i_0}, \ldots, S_{i_{m-1}}\}$  whose spurious transitions in the initial circuit **C** are eliminated when setting gate  $FS_i$  to its controlling value. The list of scan latches is required during generation of scan chains in part (d) of the *MSC-PARTITIONING* algorithm.
- c. In the third part, having generated the reduced circuit **C'** and the specified fault list **L**, any state of the art combinational ATPG tool can be used to generate test vectors for the faults from **L** for **C'**. Test vectors for the faults from **L** are the extra test vectors required to eliminate spurious transitions while shifting test responses in the initial circuit **C** as described in part (d). Since the freezing signals are primary outputs in **C'** as described in part (a) then **L** contains faults only on primary outputs. This will clearly speed up the the ATPG process since only backward justification and no forward propagation is required. Moreover, the specified fault list is significantly smaller than the entire fault set which will further reduce ATPG computational time for computing extra test vectors.

ALGORITHM: CIRCUIT-REDUCTION INPUT: Circuit C OUTPUT: Reduced Circuit C' Freezing Signals  $\{FS_0, \ldots, FS_{p-1}\}$  $eliminated\_gates = \{S_0, \dots, S_{m-1}\}$ 1 2 modified\_gates =  $\emptyset$ 3 *current\_frontier* = { $S_0, \ldots, S_{m-1}$ } 4  $new_{frontier} = Ø$ 5 while (*current\_frontier*  $\neq \emptyset$ ) { 6 for all  $g_x \in \text{neighbours}(current\_frontier)$ 7 **if** (*all* inputs of  $g_x \in eliminated\_gates$ ) { 8 add  $g_x$  to *eliminated\_gates* 9 remove  $g_x$  from *modified\_gates* 10 add  $g_x$  to new\_frontier 11 } 12 else 13 add  $g_x$  to modified\_gates 14 *current\_frontier* = *new\_frontier* 15  $new_frontier = \emptyset$ 16 } 17 generate reduced circuit **C'** as follows { eliminate *all* the gates  $g_x \in eliminated\_gates$ 18 19 modify *all* the gates  $g_y \in modified\_gates$ 20 } 21 freezing signals  $\{FS_0, \ldots, FS_{p-1}\}$  are output signals of  $\{g_0, \ldots, g_{p-1}\} = modified\_gates$ 22  $\{S_{e_0}, \ldots, S_{e_{m-1}}\}$  for which no freezing signal exists are introduced in the extra scan chain ESC 23 return Reduced Circuit C' Freezing signals  $\{FS_0, \ldots, FS_{p-1}\}$ 



It should be noted that some faults from L are redundant which implies that no extra test vector can be computed to stop the propagation of the spurious transitions from scan latches associated with the respective fault. However, this scan latches are treated as self-incompatible and handled by re-specifying the fault list as described in the last part (e) of the *MSC-PARTITIONING* of Figure 7.

- d. Given the extra test with a list of faults from **L** detected by each extra test vector  $EV_i$ , scan latch classification according to definitions from section 3.2 is done as follows. If two faults  $FS_i \, sa nc_i$  and  $FS_j \, sa nc_j$  from **L** are incompatible (i.e. they are not detected by the same extra test vector) then each element of the two lists of scan latches associated with the two faults  $\{S_{i_0}, \ldots, S_{i_{m-1}}\}$  and  $\{S_{j_0}, \ldots, S_{j_{q-1}}\}$  respectively, are incompatible (otherwise they are compatible). This leads to grouping all the scan latches, associated with faults detectable by *single* extra test vector, into a *single* scan chain. However, this may lead to self-incompatible scan latches (Definition 4 of section 3.2) when different extra test vectors eliminate spurious transitions from the same scan latch. Consequently, while there are self-incompatible the *MSC-PARTITIONING* algorithm will iterate through parts (e), (c), (d) as explained next.
- e. In the case that there are self-incompatible scan latches after the generation of multiple scan chains then the problem needs to be addressed as it was briefly explained in example 4 of section 3.2. The faults  $FS_i$  sa  $nc_i$  which have attached self-incompatible scan latches are removed from fault list **L** and new faults are specified on the lines in the fanout paths of  $FS_i$ . Thus, the respecified fault list **L** will be provided back to the ATPG process for computing extra test vectors (part (c)) which will be followed by new multiple scan chain generation based on fault compatibility (part (d)). This iterative process continues until there are no self-incompatible scan latches left.

The *MSC-PARTITIONING* algorithm of Figure 7 returns the scan chains of compatible scan latches, the extra scan chain ESC and the extra test set of extra test vectors used to define a new test application strategy, as explained in the following section.

## 4.2 New Test Application Strategy Using Multiple Scan Chains and Extra Test Vectors

Having partitioned the scan latches into multiple scan chains with an extra test vector for each scan chain (section 4.1), this section introduces a new test application strategy for power minimization during test application in full scan sequential circuits.

Γ

| ALGORITHM: <b>MSC-TEST APPLICATION</b><br>INPUT: Test Set $S = \{V_0,, V_{n-1}\}$ , Circuit C<br>Scan Chains $\{SC_0,, SC_{k-1}, ESC\}$<br>Extra Test Set $ES = \{EV_0,, EV_{k-1}\}$<br>OUTPUT: Node transition count <b>NTC</b> |                                                                         |  |  |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|--|--|--|--|--|
| 1 N                                                                                                                                                                                                                              | T <b>TC</b> ←0                                                          |  |  |  |  |  |
| $2 \mathbf{f}$                                                                                                                                                                                                                   | <b>Prevery</b> V: from <b>S</b> with $i = 0$ $n = 1$                    |  |  |  |  |  |
|                                                                                                                                                                                                                                  | for every SC, with $i = 0$ , $k = 1$                                    |  |  |  |  |  |
|                                                                                                                                                                                                                                  | apply $EV_i$ to primary inputs                                          |  |  |  |  |  |
| 5                                                                                                                                                                                                                                | apply $Ev_j$ to primary inputs<br>compute $NTC$ by simulating $C$ when  |  |  |  |  |  |
| 5                                                                                                                                                                                                                                | compute $NTC_{i,j}$ by simulating C when                                |  |  |  |  |  |
|                                                                                                                                                                                                                                  | shifting in the present state part of test                              |  |  |  |  |  |
|                                                                                                                                                                                                                                  | vector $V_i$ through scan latches from $SC_j$                           |  |  |  |  |  |
| 6                                                                                                                                                                                                                                | $\mathbf{NTC} \leftarrow \mathbf{NTC} + NTC_{i,j}$                      |  |  |  |  |  |
| 7                                                                                                                                                                                                                                | }                                                                       |  |  |  |  |  |
| 8                                                                                                                                                                                                                                | apply primary part of $V_i$ to primary inputs                           |  |  |  |  |  |
| 9                                                                                                                                                                                                                                | compute $NTC_{i,ESC}$ by simulating <b>C</b> when                       |  |  |  |  |  |
|                                                                                                                                                                                                                                  | shifting in the present state part of test                              |  |  |  |  |  |
|                                                                                                                                                                                                                                  | vector $V_i$ through scan latches from <b>ESC</b>                       |  |  |  |  |  |
| 10                                                                                                                                                                                                                               | <b>NTC</b> $\leftarrow$ <b>NTC</b> + <i>NTC</i> <sub><i>i</i>.ESC</sub> |  |  |  |  |  |
| 11                                                                                                                                                                                                                               | $NTC \leftarrow NTC + NTC_{i,LOAD}$                                     |  |  |  |  |  |
| 12 }                                                                                                                                                                                                                             | ,                                                                       |  |  |  |  |  |
| 13 return NTC                                                                                                                                                                                                                    |                                                                         |  |  |  |  |  |
|                                                                                                                                                                                                                                  |                                                                         |  |  |  |  |  |
|                                                                                                                                                                                                                                  |                                                                         |  |  |  |  |  |

Figure 9: Proposed test application strategy using multiple scan chains and extra test vectors

Multiple Scan Chain Test Application (*MSC-TEST APPLICATION*) algorithm computes the node transition count *NTC* (section 2) during the entire test application period for the given test set **S**, circuit **C**, multiple scan chains { $SC_0, ..., SC_{k-1}, ESC$ }, and extra test set **ES** = { $EV_0, ..., EV_{k-1}$ }. Figure 9 gives the pseudocode of the proposed *MSC-TEST APPLICA-TION* algorithm. The value of **NTC** is 0 at the beginning of the algorithm and it is gradually increased as the entire test set is traversed. The outer loop represents the traversal of all the test vectors  $V_i$ , with i = 0, ..., n - 1, from test set **S**. Shifting out test responses through all the scan chains are then considered in the inner loop. For each scan chain  $SC_j$ , circuit **C** is simulated by applying the extra test vector  $EV_j$  to primary inputs and  $NTC_{i,j}$  is added to the node transition count **NTC**.  $NTC_{i,j}$  stands for node transition count while shifting in present state part of test vector  $V_i$  through scan chain  $SC_j$  and applying extra test vector  $EV_j$  to the primary inputs. After shifting out the test responses though each scan chain  $SC_j$  the primary input part of test vector  $V_i$  is applied to primary inputs and  $NTC_{i,ESC}$  is computed while shifting out test response through the extra scan chain ESC. Finally the entire test vector  $V_i$  is applied to the circuit under test and  $NTC_{i,LOAD}$  required to load the test response in the scan latches, is added to **NTC**. After the completion of the inner loop, the outer loop continues until the entire test set is examined. The algorithm returns the value of **NTC** over the entire test application period. It should be noted that algorithms presented in this section are independent of test vector and scan latch order. Unlike the algorithms from [12, 33–35] whose computational time is prohibitively large hindering the exploration for large sequential circuits, the proposed *MSC-PARTITIONING* and *MSC-TEST APPLICATION* algorithms have low computational time and can handle large circuits as shown in the following section.

## **5** Experimental Results

This section demonstrates through a set of benchmark examples that multiple scan chains combined with extra test vectors, as outlined in section 3, yield savings in power dissipation during test application. The algorithms described in section 4 have been implemented on a 500 MHz Pentium III PC with 128 MB RAM running Linux and using GNU CC version 2.91. The average value of node transition count (*NTC*) reported throughout this section is calculated using the formulas from section 2 under the assumption of the zero delay model. The use of zero delay model is motivated by very rapid computation of *NTC* and by the observation that power dissipation under zero delay model has a high correlation with power dissipation under general delay model [41]. However, the proposed technique applies equally to other general delay models as unit and variable delay models [42]. Furthermore, due to elimination of spurious transitions (Definition 1) the propagation of hazards and glitches is eliminated leading to even greater reductions for power dissipation in the case of unit and variable delay model. Besides, the aim of this paper is not to give exact values of power dissipation during test application, but to define a new design for testability architecture and a new test application strategy for power minimization that applies equally to every delay model.

Table 1 shows the experimental results for all circuits from ISCAS89 benchmark set [46]

|         | ATALANTA [43] |          |         | ATOM [44] |          |         | MINTEST [45] |          |         |    | FSC    | CPU  |
|---------|---------------|----------|---------|-----------|----------|---------|--------------|----------|---------|----|--------|------|
| circuit | ΤV            | trad.    | prop.   | ΤV        | trad.    | prop.   | ΤV           | trad.    | prop.   | SC | length | time |
|         |               | NTC      | NTC     |           | NTC      | NTC     |              | NTC      | NTC     |    | iongui | (s)  |
| s208    | 34            | 54.54    | 26.67   | 65        | 55.82    | 25.93   | 27           | 54.94    | 24.07   | 2  | 0      | 1    |
| s298    | 33            | 103.56   | 39.23   | 52        | 115.74   | 46.36   | 23           | 108.88   | 43.81   | 3  | 6      | 1    |
| s344    | 24            | 130.36   | 42.54   | 62        | 131.58   | 48.03   | 13           | 124.77   | 46.69   | 4  | 4      | 1    |
| s349    | 22            | 131.90   | 52.59   | 65        | 132.58   | 53.63   | 13           | 128.91   | 55.86   | 4  | 1      | 1    |
| s382    | 32            | 133.91   | 50.99   | 72        | 145.63   | 51.81   | 25           | 148.40   | 55.54   | 3  | 6      | 1    |
| s386    | 74            | 81.31    | 63.75   | 109       | 86.31    | 58.92   | 63           | 85.45    | 59.09   | 3  | 0      | 1    |
| s400    | 33            | 135.97   | 51.88   | 98        | 107.24   | 43.82   | 43           | 100.60   | 40.11   | 3  | 6      | 1    |
| s420    | 73            | 111.69   | 54.46   | 98        | 107.24   | 43.82   | 43           | 100.60   | 40.11   | 2  | 0      | 1    |
| s444    | 33            | 139.92   | 47.68   | 77        | 150.01   | 49.74   | 24           | 156.59   | 51.87   | 4  | 6      | 1    |
| s510    | 60            | 123.89   | 64.38   | 90        | 115.23   | 65.86   | 54           | 114.04   | 66.14   | 4  | 0      | 1    |
| s526    | 60            | 170.61   | 63.05   | 107       | 186.24   | 67.38   | 49           | 183.15   | 67.95   | 4  | 6      | 1    |
| s641    | 58            | 166.32   | 60.03   | 99        | 184.13   | 60.31   | 21           | 176.95   | 62.92   | 3  | 0      | 1    |
| s713    | 58            | 173.34   | 57.15   | 100       | 196.92   | 59.92   | 21           | 192.82   | 63.50   | 3  | 0      | 1    |
| s820    | 110           | 137.52   | 111.08  | 190       | 139.01   | 110.31  | 93           | 137.89   | 112.44  | 5  | 0      | 1    |
| s832    | 115           | 139.83   | 115.50  | 200       | 138.07   | 114.29  | 94           | 138.61   | 115.29  | 3  | 0      | 1    |
| s838    | 148           | 227.46   | 108.24  | 183       | 199.88   | 81.15   | 75           | 187.63   | 70.41   | 2  | 0      | 1    |
| s953    | 90            | 158.50   | 76.43   | 138       | 169.70   | 76.37   | 76           | 169.09   | 76.02   | 3  | 23     | 1    |
| s1196   | 140           | 101.31   | 68.12   | 227       | 105.37   | 68.37   | 113          | 105.47   | 68.83   | 4  | 2      | 1    |
| s1238   | 151           | 101.50   | 65.15   | 240       | 107.46   | 66.11   | 121          | 103.88   | 65.24   | 4  | 2      | 1    |
| s1423   | 70            | 453.58   | 137.63  | 135       | 509.96   | 150.41  | 20           | 507.21   | 150.82  | 5  | 3      | 2    |
| s1488   | 119           | 340.75   | 225.81  | 196       | 347.17   | 227.33  | 101          | 366.18   | 234.11  | 3  | 0      | 3    |
| s1494   | 125           | 329.98   | 266.05  | 191       | 353.12   | 237.43  | 100          | 371.16   | 235.81  | 4  | 0      | 3    |
| s5378   | 259           | 1772.07  | 527.87  | 358       | 1786.60  | 531.44  | 97           | 1809.42  | 537.51  | 5  | 33     | 49   |
| s9234   | 366           | 3160.16  | 760.35  | 660       | 3123.35  | 754.58  | 105          | 3045.64  | 751.093 | 6  | 20     | 201  |
| s13207  | 461           | 5949.81  | 2051.55 | 709       | 5972.92  | 2056.18 | 233          | 5977.48  | 2047.51 | 5  | 330    | 472  |
| s15850  | 436           | 5260.90  | 942.07  | 643       | 5487.29  | 952.92  | 94           | 5481.82  | 947.82  | 6  | 62     | 596  |
| s35932  | 65            | 11067.50 | 5440.19 | 129       | 13039.30 | 6291.21 | 12           | 10860.50 | 5374.45 | 2  | 0      | 1903 |
| s38417  | 904           | 15920.00 | 7159.88 | 1458      | 15849.20 | 7136.23 | 68           | 14199.90 | 6486.23 | 5  | 1079   | 8151 |
| s38584  | 658           | 12766.30 | 3914.41 | 989       | 12871.30 | 3912.55 | 110          | 12901.50 | 3896.92 | 7  | 7      | 3543 |

Table 1: Experimental results using multiple scan chains for power minimization.

using three different ATPG test tools [43–45]. The first and second columns give the circuit name and the number of test vectors (TV) respectively generated using the ATALANTA test tool [43]. Third column shows the initial average value of NTC (trad. NTC), which is the total value of NTC using the traditional single scan chain design [36] divided by the total number of clock cycles over the entire test application period. The next column 4 shows the final average value of NTC (prop. NTC) when using multiple scan chains and extra test vectors (MSC-TESTAPPLICATION algorithm from section 4.2). The same experiment has been completed for non compact test sets generated by ATOM test tool [44] (columns 5-7) and compact test sets generated by MINTEST compaction tool [45] (columns 8-10) respectively. It should be noted that all the three test sets [43–45] achieve 100% fault coverage. Columns 11 and 12 give the number of scan chains (SC) and the length of the extra scan chain (ESC) respectively computed using the MSC-PARTITIONING algorithm outlined in section 4.1. The number of scan chains varies from 2 as in the case of *s208* up to 7 as in the case of *s38584*. The small number of scan chains implies that both area overhead required to control multiple scan chains and test data overhead caused by extra test vectors are very low since they are proportional to the number

|         |               | power | \<br>\  |          | test area    |         |       |
|---------|---------------|-------|---------|----------|--------------|---------|-------|
| circuit | reduction (%) |       |         |          | overhead (%) |         |       |
|         | AIALANIA      | ATOM  | MINTEST | AIALANIA | ATOM         | MINTEST |       |
| s208    | 51.09         | 53.54 | 56.18   | 3.26     | 1.70         | 4.11    | 10.00 |
| s298    | 62.10         | 59.94 | 59.75   | 1.06     | 0.67         | 1.53    | 11.33 |
| s344    | 67.36         | 63.49 | 62.57   | 4.68     | 1.81         | 8.65    | 11.11 |
| s349    | 60.12         | 59.54 | 56.66   | 5.11     | 1.73         | 8.65    | 11.11 |
| s382    | 61.91         | 64.42 | 62.57   | 0.78     | 0.34         | 1.00    | 9.09  |
| s386    | 21.59         | 31.73 | 30.84   | 2.18     | 1.48         | 2.56    | 10.33 |
| s400    | 61.84         | 59.13 | 60.12   | 0.75     | 1.08         | 2.46    | 10.63 |
| s420    | 51.23         | 59.13 | 60.12   | 1.45     | 1.08         | 2.46    | 9.52  |
| s444    | 65.91         | 66.84 | 66.87   | 1.13     | 0.48         | 1.56    | 11.04 |
| s510    | 48.03         | 42.84 | 42.01   | 5.06     | 3.37         | 5.62    | 10.52 |
| s526    | 63.04         | 63.82 | 62.89   | 0.62     | 0.35         | 0.76    | 11.11 |
| s641    | 63.90         | 67.24 | 64.43   | 3.35     | 1.96         | 9.25    | 8.00  |
| s713    | 67.02         | 69.57 | 67.06   | 3.35     | 1.94         | 9.25    | 8.00  |
| s820    | 19.22         | 20.64 | 18.45   | 3.55     | 2.05         | 4.20    | 12.50 |
| s832    | 17.39         | 17.22 | 16.82   | 2.04     | 1.17         | 2.49    | 10.71 |
| s838    | 52.41         | 59.39 | 62.47   | 0.69     | 0.56         | 1.37    | 4.65  |
| s953    | 51.77         | 0.79  | 54.99   | 0.51     | 55.03        | 0.93    | 6.81  |
| s1196   | 32.75         | 35.11 | 34.74   | 0.93     | 0.57         | 1.16    | 6.81  |
| s1238   | 35.81         | 38.47 | 37.19   | 0.86     | 0.54         | 1.08    | 4.16  |
| s1423   | 69.65         | 70.50 | 70.26   | 1.06     | 0.55         | 3.73    | 3.79  |
| s1488   | 33.73         | 34.51 | 36.06   | 1.44     | 0.87         | 1.69    | 4.16  |
| s1494   | 19.37         | 32.76 | 36.46   | 1.82     | 1.19         | 2.28    | 6.12  |
| s5378   | 70.21         | 70.25 | 70.29   | 0.25     | 0.18         | 0.67    | 2.60  |
| s9234   | 75.93         | 0.19  | 75.84   | 0.11     | 75.33        | 0.69    | 2.06  |
| s13207  | 65.51         | 65.57 | 65.74   | 0.07     | 0.04         | 0.15    | 0.91  |
| s15850  | 82.09         | 82.63 | 82.70   | 0.14     | 0.09         | 0.67    | 0.92  |
| s35932  | 50.84         | 51.75 | 50.51   | 0.06     | 0.03         | 0.33    | 0.25  |
| s38417  | 55.02         | 54.97 | 54.32   | 0.01     | 0.01         | 0.09    | 0.38  |
| s38584  | 69.33         | 69.60 | 69.79   | 0.02     | 0.01         | 0.14    | 0.42  |

Table 2: Power reduction and overhead in test area and test data.

of scan chains. For most of the examples the size of the extra scan chain (ESC length) is nil or very low. However, there are to extreme cases as in the case of s13207 and s38417 where the number of independent scan latches (Definition 5 from section 3.1) is very high leading to an increase in ESC length and hence insignificant penalty in power reduction. It can be clearly seen that the proposed test application strategy (*MSC-TEST APPLICATION* from section 4.2) has significantly smaller average value of *NTC* for all the benchmark circuits when compared to initial value of *NTC* computed using the test application strategy from [36] which employs a single scan chain. Furthermore, the computational time is very low (< 1s) for small circuits. Moreover, for large circuits which are not handled by previous approaches [12, 34, 35], as in the case of s38584, it takes < 3600s to achieve substantial reduction in average value of *NTC*.

To give an indication of the reductions in power dissipation, Table 2 shows the percentage reduction in power dissipation (columns 1-3) and percentage overhead in test data (columns 4-6) and test area (column 7). The power dissipation is considered directly proportional to the average value of *NTC*. The test area overhead represents the extra logic required to multiplex the scan output signal (Figure 1) and it is computed accurately by synthesizing and technology mapping the ISCAS89 circuits to AMS 0.35 micron technology [47]. The test data overhead

represents the number of extra bits required for the extra test vectors (the number of scan chains multiplied by the number of primary inputs). Note that test area overhead decreases as the complexity of the circuit increases. This is due to the fact that extra area occupied by scan control register and selection logic (Figure 1 from section 3.1) required to control multiple scan chains is very small when compared to the size of large sequential circuits. The power reduction varies from approximately 82% as in the case of s15850 down to under 17% as in the case of s832. It should be noted that moderate power reduction as in the case of s386, s510, s820, s832, s1488, s1494 is due to very small number of scan latches (5 to 6 scan latches only) which are difficult to be partitioned in multiple scan chains. However, for modern complex digital circuits where the number of scan latches is significantly higher (thousands as in the case of s38584) the power reduction is up to 69% at the expense of insignificant < 1% test data and test area overhead. This clearly shows the advantage of the proposed technique for power minimization using multiple scan chains for large sequential circuits.



Figure 10: Curve illustrating test set independent final value of NTC.

A further advantage of the proposed technique is that due to test set independence the final average value of *NTC* is predictable within a given range of values regardless of test vectors applied to the circuit. This is justified by the fact that the proposed low overhead area multiple scan chain architecture introduced in section 1 is not overly sensitive to the values of test vectors since only a single chain is active at a time and the spurious transitions within the combinational circuit are eliminated by the extra primary input vector *regardless* of the value loaded in non active scan chains. This is shown in Figure 5 where the graphs for average value of *NTC* for for 7 largest ISCAS89 benchmark under three different size test sets are given. For all three

test sets MINTEST [45], ATALANTA [43] and ATOM [44] the average values of *NTC* are are approximately equal. This implies that the proposed technique can further be applied to more DFT methodologies such as scan-based BIST [36] where the regardless of the value of the pseudorandom test set the savings in power dissipation are guaranteed and final values of *NTC* are predictable.

It should be noted that experimental results reported in this section using the simplified power model from section 2 do not consider power dissipated by the clock tree which is typically one third of the total power dissipation [38]. However, power dissipated by the clock tree can be substantially reduced using low power buffered clock tree design [38] which successfully handles both scan clock gating and scan clock trees required by the proposed design for testability architecture using multiple scan chains as shown in Theorem 1 of section 3.3.

## 6 Conclusions

This paper has presented a new technique for power minimization during test application in sequential circuits using multiple scan chains. The technique is based on a new design for test (DFT) architecture and a novel test application strategy which reduces spurious transitions (Definition 1 of section 3.2) in circuit under test. When compared to traditional approach which consists of a single scan chain [36] the proposed technique employs a novel DFT architecture based on multiple scan chains leading to substantial reduction in power dissipation. The proposed technique which is test set independent overcomes large test application time required to achieve significant power savings [29-32, 34] since substantial power reductions are achieved for both compact and non compact test sets as shown in section 5. The newly introduced DFT architecture (Figure 1 from section 3.1) does not introduce any performance degradation when compared to previous approaches employing modified scan cell design [29]. Unlike previous approaches which do not consider [12, 28, 30–35] or reduce clock tree power dissipation only for nonessential test vectors [29] the proposed technique reduces clock tree power for all the test vectors of a very small test set where each test vector is essential as described in section 3.3. While previous approaches [28, 29] required considerable test area overhead associated with detection logic the proposed DFT architecture requires very low extra area to control multiple scan chains which are successfully combined with extra test vectors in the newly introduced test application strategy in section 4.2. Since a high number of extra test vectors [34] emerges as a problem to testers which need to change to support the large volume of test data [37], the proposed technique based on a small number of extra test vectors introduces very low overhead in test data as shown in section 5. Moreover, due to efficient algorithms described in section 4 the proposed technique is computationally inexpensive unlike previous approaches [12, 33–35] whose computational time is prohibitively large hindering the exploration for large sequential circuits. Finally, the synthesizable extra hardware required by the new DFT architecture introduced in section 3.1, the efficient algorithms given in section 4.1, and the novel test application strategy described in section 4.2 make the technique proposed in this paper easily embeddable in the existing VLSI design flow using state of the art third party electronic design automation tools.

#### Acknowledgement

The authors wish to acknowledge Dr. Dong S. Ha of Virginia Polytechnic Institute and State University for providing ATALANTA ATPG tool. Also, the authors acknowledge the Centre of Reliable and High-Performance Computing of University of Illinois at Urbana-Champaign for providing test sets generated by ATOM and MINTEST test tools.

## References

- Y. Zorian, "A distributed BIST control scheme for complex VLSI devices," in *Proc. 11th IEEE VLSI Test Symposium*, pp. 4–9, 1993.
- [2] M. Pedram, "Power minimization in IC design: Principles and applications," ACM Transactions on Design Automation of Electronic Systems, vol. 1, pp. 3–56, Jan 1996.
- [3] Semiconductor Industry Association (SIA), The International Technology Roadmap for Semiconductors (ITRS): 1999 Edition. http://public.itrs.net/1999\_SIA\_Roadmap/Home.htm, 1999.
- [4] P. Girard, "Low power testing of VLSI circuits: Problems and solutions," in *First International Symposium on Quality of Electronic Design (ISQED)*, pp. 173–180, 2000.
- [5] S. Wang and S. Gupta, "ATPG for heat dissipation minimization during test application," *IEEE Transactions on Computers*, vol. 47, pp. 256–262, Feb 1998.
- [6] R. Chou, K. Saluja, and V. Agrawal, "Scheduling tests for VLSI systems under power constraints," *IEEE Transactions on VLSI*, vol. 5, pp. 175–184, Jun 1997.

- [7] V. Muresan, V. Muresan, X. Wang, and M. Vladutiu, "The left edge algorithm and the tree growing technique in block-test scheduling under power constraints," in *Proc. of the 18th IEEE VLSI Test Symposium*, 2000.
- [8] K. Chakrabarty, "Test scheduling for core-based systems," in Proc. IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 391–394, 1999.
- [9] C. Ravikumar, G. Chandra, and A. Verma, "Simultaneous module selection and scheduling for power-constrained testing of core based systems," in *13th International Conference* on VLSI Design, pp. 462–467, 2000.
- [10] E. Larsson and Z. Peng, "Test infrastructure design and test scheduling optimization," in *IEEE European Test Workshop*, 2000.
- [11] N. Nicolici and B. Al-Hashimi, "Power conscious test synthesis and scheduling for BIST RTL data paths," in *Proc. IEEE International Test Conference (ITC 2000)*, (Atlantic City, New Jersey), October 2000.
- [12] V. Dabholkar, S. Chakravarty, I. Pomeranz, and S. Reddy, "Techniques for minimizing power dissipation in scan and combinational circuits during test application," *IEEE Transactions on CAD*, vol. 17, pp. 1325–1333, Dec 1998.
- [13] P. Girard, C. Landrault, S. Pravossoudovitch, and D. Severac, "Reduction of power consumption during test application by test vector ordering," *IEE Electronics Letters*, vol. 33, no. 21, pp. 1752–1754, 1997.
- [14] P. Flores, J. Costa, H. Neto, J. Monteiro, and J. Marques-Silva, "Assignment and reordering of incompletely specified pattern sequences targeting minimum power dissipation," in *12th International Conference on VLSI Design*, pp. 37–41, 1999.
- [15] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, "A test vector ordering technique for switching activity reduction during test operation," in *9th Great Lakes Symposium on VLSI (GLSVLSI99)*, pp. 24–27, 1999.
- [16] S. Wang and S. Gupta, "DS-LFSR: A new BIST TPG for low heat dissipation," in *Proc. IEEE International Test Conference*, pp. 848–857, 1997.
- [17] X. Zhang, K. Roy, and S. Bhawmik, "POWERTEST: A tool for energy conscious weighted random pattern testing," in *12th International Conference on VLSI Design*, pp. 416–422, 1999.

- [18] X. Zhang and K. Roy, "Design and synthesis of low power weighted random pattern generator considering peak power reduction," in *International Symposium on Defect and Fault Tolerance in VLSI Systems*, pp. 148–156, 1999.
- [19] M. Brazzarola and F. Fummi, "Power characterization of LFSRs," in *International Symposium on Defect and Fault Tolerance in VLSI Systems*, pp. 138–146, 1999.
- [20] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, "A test vector inhibiting technique for low energy BIST design," in *Proc. 17th IEEE VLSI Test Symposium*, pp. 407– 412, 1999.
- [21] S. Manich, A. Gabarro, M. Lopez, J. Figueras, P. Girard, L. Guiller, C. Landrault, S. Pravossoudovitch, P. Teixeira, and M. Santos, "Low power BIST by filtering nondetecting vectors," *Journal of Electronic Testing: Theory and Applications (JETTA)*, vol. 16, pp. 193–202, June 2000.
- [22] F. Corno, M. Rebaudengo, M. S. Reorda, and M. Violante, "A new BIST architecture for low power circuits," in *IEEE European Test Workshop (ETW99)*, pp. 160–164, 1999.
- [23] F. Corno, M. Rebaudengo, M. S. Reorda, G. Squillero, and M. Violante, "Low power BIST via hybrid cellular automata," in 18th IEEE VLSI Test Symposium, 2000.
- [24] C. Ravikumar and N. Prasad, "Evaluating BIST architectures for low power," in 7th Asian Test Symposium, pp. 430–434, 1998.
- [25] P. Girard, L. Guiller, C. Landrault, and S. Pravossoudovitch, "Circuit partitioning for low power BIST design with minimized peak power consumption," in 8th Asian Test Symposium (ATS99), pp. 89–94, 1999.
- [26] D. Gizopoulos, N. Kranitis, A. Paschalis, M. Psarakis, and Y. Zorian., "Effective low power BIST for datapaths," in *Proc. of the Design, Automation and Test in Europe Conference (DATE)*, p. 757, 2000.
- [27] F. Corno, P. Prinetto, M. Rebaudengo, and M. Sonza-Reorda, "A test pattern generation methodology for low power consumption," in *16th IEEE VLSI Test Symposium*, pp. 453– 460, 1998.
- [28] F. Corno, M. Rebaudengo, M. S. Reorda, and M. Violante, "Optimal vector selection for low power BIST," in *IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems*, pp. 219–226, 1999.

- [29] S. Gerstendorfer and H. Wunderlich, "Minimized power consumption for scan-based BIST," *Journal of Electronic Testing: Theory and Applications (JETTA)*, vol. 16, pp. 203– 212, June 2000.
- [30] K.-H. Tsai, S. Hellebrand, J. Rajski, and M. Marek-Sadowska, "STARBIST: Scan autocorrelated random pattern generation," in 34th IEEE/ACM Design Automation Conference, pp. 472–477, 1997.
- [31] S. Wang and S. Gupta, "LT-RTPG: A new test-per-scan BIST TPG for low heat dissipation," in *Proc. IEEE International Test Conference*, pp. 85–94, 1999.
- [32] R. Sankaralingam, R. R. Oruganti, and N. A. Touba, "Static compaction techniques to control scan vector power dissipation," in *Proc. of the 18th IEEE VLSI Test Symposium*, 2000.
- [33] N. Nicolici and B. Al-Hashimi, "Minimisation of power dissipation during test application in full scan sequential circuits using primary input freezing," *IEE Proceedings - Computers and Digital Techniques*, vol. 147, no. to appear, 2000.
- [34] S. Wang and S. Gupta, "ATPG for heat dissipation minimization during scan testing," in *Proc. 34th Design Automation Conference*, pp. 614–619, 1997.
- [35] T.-C. Huang and K.-J. Lee, "An input control technique for power reduction in scan circuits during test application," in *Proc. 8th Asian Test Symposium*, pp. 315–320, 1999.
- [36] M. Abramovici, M. Breuer, and A. Friedman, *Digital Systems Testing and Testable Design*. IEEE Press, 1990.
- [37] R. Kapur and T. Williams, "Tough challenges as design and test go nanometer," *Computer*, vol. 32, pp. 42–45, November 1999.
- [38] A. Vittal and M. Marek-Sadowska, "Low-power buffered clock tree design," *IEEE Trans*actions on CAD, vol. 16, pp. 965–975, Sep 1997.
- [39] A. Chandrakasan and R. Brodersen, *Low Power Digital CMOS Design*. Kluwer Academic Publishers, 1995.
- [40] L. Rade and B. Westergren, Mathematics Handbook for Science and Engineering. Springer-Verlag, 4th ed., 1999.

- [41] A. Shen, A. Ghosh, S. Devadas, and K. Keutzer, "On average power dissipation and random pattern testability of CMOS combinational logic networks," in *Proc. IEEE/ACM International Conference on Computer Aided Design*, pp. 402–407, 1992.
- [42] M. Hsiao, E. Rudnick, and J. Patel, "Effects of delay models on peak power estimation of VLSI sequential circuits," in *Proc. International Conference on Computer Aided Design*, pp. 45–51, 1997.
- [43] H. K. Lee and D. S. Ha, "On the generation of test patterns for combinational circuits," Tech. Rep. No. 12-93, Department of Electrical Engineering, Virginia Polytechnic Institute and State University, 1991.
- [44] I. Hamzaoglu and J. Patel, "New techniques for deterministic test pattern generation," *Journal of Electronic Testing: Theory and Application (JETTA)*, vol. 15, pp. 63–73, Aug 1999.
- [45] I. Hamzaoglu and J. Patel, "Test set compaction algorithms for combinational circuits," *IEEE Transactions on CAD*, vol. 19, Aug 2000.
- [46] F. Brglez, D. Bryan, and K. Kozminski, "Combinational profiles of sequential benchmark circuits," in *Proc. International Symposium on Circuits and Systems*, pp. 1929–1934, 1989.
- [47] AMS, 0.35 Micron CMOS Process Parameters. Austria Mikro Systeme International AG, 1998.