Low power and Area Efficient System with Fast Error Correction using Pulsed Latch

Reshma K P¹, Lakshmi Raj V²
¹ Mtech Student, MET’S School of Engineering, Mala, Thrissur, Kerala
² Professor, MET’S School of Engineering, Mala, Thrissur, Kerala

Abstract: Aggressive reduction of timing margins, called timing speculation, is an effective way of reducing the supply voltage for a pipeline circuit and thereby its power consumption. However, probability of timing error increases with the voltage scaling and hence, the errors must be corrected with small cycle penalty. Here introduce an improved Razor approach by replacing flip flop by pulsed latch, which makes more effective than others. The proposed method is a low-power and area-efficient system with fast error correction using small cycle penalty. The area and power consumption are reduced by using this method. This method solves the timing problem between pulsed latches through the use of multiple non-overlap delayed pulsed clock signals instead of the conventional single pulsed clock signal. So uses a delayed pulse clock generator. Which provides short delayed pulse signal to latches.

Keywords: Low power, Area efficient, Timing error, Flip-Flop, Timing Speculation

I. INTRODUCTION

The increased complexity of modern nanometer technology integrated circuits, demands the development of urgent solutions in order to achieve appropriate reliability levels and keep the cost of testing within acceptable bounds. Reliability levels are mostly affected by the reduced power supply, the continuous transistor scaling, as well as the increased operating frequencies. As a result, transient faults are generated in a more frequent basis, making it difficult to bound error rate levels within specifications. There are various causes for timing error generation, such as power supply disturbances, crosstalk and ground bounce phenomena, increased path delay deviations, manufacturing defects. Moreover, despite the fact that complex testing procedures are followed, the massive increase of the number of the paths in modern integrated circuits (ICs) does not facilitate their timing verification in order to reduce the probability of timing failures.

In addition, the timing of modern systems is easily affected due to their operation at multiple frequency and voltage levels, which also results to increased timing error rates. Moreover, transistor aging phenomena must be considered, since they lead to the early occurrence of timing errors in a circuit’s lifecycle. Taking into account the above situation, and aiming to achieve acceptable reliability levels in modern ICs, concurrent online testing techniques for timing error detection and correction are becoming obligatory. Furthermore, dynamic voltage scaling (DVS) techniques, for low power operation, can accomplish more effectively timing error tolerance by exploiting error detection and correction mechanisms. When a timing failure occurs in a combinational logic block, the outcome is a delayed response at its outputs. Thus, after the triggering edge of the clock signal the memory elements at the outputs of this combinational block captures an erroneous value and so a timing error is generated. Numerous error detection techniques have been proposed in the open literature. These techniques can detect the delayed circuit response and provide timing error tolerance by using time redundancy approaches. In this work, we present a multiple timing error detection and correction scheme, which is oriented to pulsed latch based designs. A new pulsed latch topology is proposed. Additionally, we introduce a pipeline architecture to exploit the new pulsed latch and provide timing error tolerance in a design.

Summary of Contributions In this paper, we propose a new error correction method that has one-cycle penalty using pulsed latch. Our main contributions are as follows:

- One-cycle error correction method.
- Reduces power and area

II. LITERATURE REVIEW

The work in [1] is a power saving system for fast error correction of multiple errors through pipeline circuit. for this, introduce an improved Razor flip-flop which makes more effective use of its shadow latch, so that a pipeline stage can correct an error while continuing to receive data. This avoids the need for repeated clock gating when timing errors happen
simultaneously at different stages, or when an error persists. In [5], proposed a method for the In Situ Error Detection and Correction for PVT (Process Voltage Temperature) and SER (Soft Error Rate) Tolerance. Traditional adaptive methods that compensate for PVT variations need safety margins and cannot respond to rapid environmental changes. In this paper, present a design (RazorII) which implements a flip-flop with in situ detection and architectural correction of variation-induced delay errors. Error detection is based on flagging spurious transitions in the state-holding latch node. In [7], proposed pulse-triggered flip-flop types which are bidirectional elements in sequential logic circuits were designed. Initially, the pulse generation control logic is removed from the critical path to facilitate a faster discharge operation. Following low-power techniques are implemented, such as conditional capture, conditional precharge, conditional discharge, conditional data mapping, clock gating technique. In [2], a low-power and area-efficient shift register using pulsed latches. The area and power consumption are reduced by replacing flip-flops with pulsed latches. This method solves the timing problem between pulsed latches through the use of multiple non-overlap delayed pulsed clock signals instead of the conventional single pulsed clock signal. The shift register uses a small number of the pulsed clock signals by grouping the latches to several sub shifter registers and using additional temporary storage latches. In [4], propose and investigate a method, an error-tolerant dynamic voltage scaling technology. The Razor flip-flop was introduced as a mechanism to double-sample pipeline stage values, once with an aggressive fast clock and again with a delayed clock that guarantees a reliable second sample. A meta stability-tolerant error detection circuit was described that validates all values latched on the fast Razor clock.

In [3], present a new timing error correction scheme which allows each pipeline stage to halt for one cycle only. The small timing penalty for the error correction operation in the proposed scheme makes it possible to eliminate the extra timing guard band that was needed to accommodate timing uncertainty due to process variations. And there is a comparison to error correction in counter flow pipelining with 2k cycle penalty, where k is the order of stage with error occurred. In [6], proposed a 65 nm resilient circuit test-chip is implemented with timing-error detection and recovery circuits to eliminate the clock frequency guard band from dynamic supply voltage and temperature variations as well as to exploit path-activation probabilities for maximizing throughput. In [8], proposed a razor approach for error detection and correction with low power. Razor, a new approach to DVS, is based on dynamic detection and correction of speed path failures in digital designs. Its key idea is to tune the supply voltage by monitoring the error rate during operation. In [9], author proposed a one cycle error correction method. The Bubble Razor scheme made a breakthrough by introducing one-cycle error correction, but their method can be applied to two-phase transparent latch designs only. The contribution of this paper is a new one-cycle error correction method that can be applied to more widely use clocking elements, such as flip-flops and pulsed latches.

III. PROPOSED ARCHITECTURE

In the proposed method we are introducing a low power area efficient error correction mechanism with one cycle penalty using pulsed latch.
In Fig. 1, the main idea of the existing method is to modify the clock signal sent to the shadow latch in such a way that the shadow latch opens after the main flip-flop has captured its input data. Then the shadow latch can restore the previous, and correct, data to the main flip-flop to achieve error correction, while also capturing new input data in the same cycle. This avoids the data conflict. Here reduce the pulse width, and generate a new clock signal by delaying. The shadow latch is driven by, so that it will start capturing input data after sending its previous data to the main flip-flop during the restore cycle. Note that the window for timing speculation is the same as that of a standard RFF, but the clock signal is narrower. A final change is that input data is fed directly to the shadow latch, allowing it to capture that data even when the restore signal is high.

Figure 1: Rzor flip flop

Figure 2: Modified razor flip flop

The key idea in the proposed method is to replacing flip flop to pulsed latch. In Fig. 2 shows modified razor approach for error detection and correction. A delayed pulse generator provides delayed pulses to pulsed latch and shadow latch. So shadow latch opens after the pulsed latch. Then the shadow latch can restore the previous value, and correct data to pulsed latch to achieve error correction, here also capturing new input data in same cycle. If it has no error, the output from XOR gate is low. Otherwise restore signal become high, then the corrected data from shadow latch fed to pulsed latch. Window for timing speculation is same as that of existing razor approach, but pulse signal is narrower.
Master-lave using two latches in Fig.3(a) can be replaced by a pulsed latch consisting of a latch and a pulsed clock signal in Fig.3(b). So replace flip flop in Fig.1 to pulsed latch for better performance.

![Figure 3: (a) Master-slave flip-flop (b) Pulsed latch](image)

All pulsed latches share the pulse generation circuit for the pulsed clock signal. As a result, the area and power consumption of the pulsed latch become almost half of those of the master-slave flip-flop. The pulsed latch is an attractive solution for small area and low power consumption. For avoiding timing problem, used a delayed pulse clock generator for latches.

![Figure 4: Delayed pulse clock generator](image)

In the conventional delayed pulsed clock circuits, the clock pulse width must be larger than the summation of the rising and falling times in all inverters in the delay circuits to keep the shape of the pulsed clock. But, in the delayed pulsed clock generator in Fig.4 the clock pulsed width can be shorter than the summation of the rising and falling times because each sharp pulsed clock signal is generated from an AND gate and two delayed signals. Therefore, the delayed pulsed clock generator is suitable for short pulsed clock signals.

If an error has occurred at a stage, input data is directed to the shadow latch alone, and subsequently transmitted back to main flip-flop in the next cycle, during this period known as error free mode. No timing errors can be caused in this mode. The error correction method needs pulse gating control signal called PG to prevent incorrect data propagation through the pipeline due to the timing error. When an error occurs at a stage, it operates in error free mode until signal is propagated back to the same stage. The existing error correction method needs clock gating control signal(CG) is used for prevent indirect data propagation caused by timing error in pipeline circuit.
The five stage pipeline circuit shown in Fig. 5(a) consisting stages are s1, s2, s3, s4 and s5. The both are connected in series manner. In between all the stages there is a combinational circuit. Fig. 5(b) shows an example of error correction. Suppose that instruction fails at stage S3 in cycle 5. Then S3 operates in error-free mode from cycle 5, and a PG signal is transmitted to the next stages at every cycle. In cycle 10, S3 receives a PG signal from stage S2 and leaves error-free mode. An advantage of error-free mode is that it can allow several errors at the same stage to be corrected in one cycle.

IV. EXTENSION TO GENERAL PIPELINE ARCHITECTURES

Error correction method applied to pipeline circuit. In a computing, a pipeline is set of data processing elements connected in series. Pipeline may have multi fan-in and fan-out stages, or a loop. If it does, it will fail if a multi fan-in stage receives PG signals from part of its input stages alone. For example, suppose that an error occurs in cycle 5 at stage S1 of the 5-stage pipeline shown in Fig. 6(b) shows the data stored in each stage at each cycle. In cycle 6, stage S5 is stalled because one of its input stages, S1, detected an error in the previous cycle. However, another input stage, S4, still sends data to stage S5 in cycle 6. Therefore, from cycle 6, stage S5 is generating incorrect data.

For avoiding this problem, we use virtual error concept. We introduce a new control signal VE to realize this concept, and control signals are propagated as follows: a stage which receives a PG signal from any of its input stages sends VE signals back to all of its input stages, and also sends PG signals to all of its output stages in the next cycle, as before. A stage has already generated an error in the current cycle, is not affected by a VE signal. Otherwise, a stage which receives a VE signal enters error-free mode.
Fig. 6(c) shows the PG and VE signals. In cycle 5, stage S1 sends PG signals to stages S2 and S5, and therefore stage S2 sends a VE signal to stage S1, and stage S5 sends VE signals to its input stages (stages S1 and S4), all in the same cycle. The VE signal does not affect stage S1. Because the stage S1 already experienced a timing error in the earlier part of cycle 5. Then, Stage S4 enters error-free mode immediately. In cycle 6, a PG signal sends from stage S2 to stage S3, and then stage S3 sends a VE signal back to stage S2. S2 was gated in cycle 6. Since VE signal is nullified. Stage S3 sends a PG signal to stage S4 in cycle 7. Therefore S4 leaves error-free mode, and stage S4 sends a VE signal back to S3.

This protocol also works correctly in loop, when main challenge is to prevent infinite looping of PG signals. This occurs depends on whether an error occurs before, in, or after the loop. Consider a pipeline with a loop, and suppose that three separate errors occur at stages S1, S3, and S5, respectively as shown in Fig. 7. When an error occurs before the loop (stage S1 in cycle 5), a PG signal is inserted into the loop and as a consequence a VE signal generates a virtual error at stage S4. This virtual error causes PG signals to be propagated from stage S4 to its output stages, S2 and S5, in cycle 5. The PG signal which was sent to stage S5 is propagated to the first stage, S1. As a result, stage S1 leaves error-free mode in cycle 7. The PG signal which was sent to stage S2 is propagated to stage S4. Thus stage S4 leaves error-free mode in cycle 8. The timing penalty for error correction is still just one cycle. Now see what happens when an error occurs within the loop, at stage S3 in cycle 9. A PG signal is propagated from stage S3 to stage S4 in cycle 9. In cycle 10, stage S4 sends a PG signal to its
output stages, S2 and S5. When stage S2 receives a PG signal, VE signals are sent to stages S1 and S4. Since stage S4 has already been gated in cycle 10, the VE signal which was sent to stage S4 is nullified. Stage S1 is not gated, nor does it detect an error in cycle 10.

Thus a virtual error is generated at stage S1, which causes a PG signal to be propagated to stage S3, and thus stage S3 leaves error-free mode in cycle 12. Stage S1 leaves error-free mode when it receives the PG signal which was propagated from stage S3. In the third case, when an error occurs after the loop, at stage S5 in cycle 13. Then a PG signal is propagated to stage S1 in cycle 13. When stage S2 receives the PG signal, it sends VE signals to stages S1 and S4, causing a virtual error to be generated at stage S4 in cycle 14, and then the PG signal is propagated to S5, which will leave error-free mode in cycle 15. Stage S4 leaves error-free mode in cycle 17, when it receives the PG signal issued by stage S5.

V. RESULTS AND DISCUSSION

Voltage scaling based on timing speculation has become the effective way of reducing the power consumption in the nano meter regime. For increase the effectiveness of this form of voltage scaling, we need to reduce the overall timing penalty. So requires the fastest possible mechanism for error correction is one-cycle error correction is achieved by razor flip flop. For better performance flip flop in the razor flip flop is replaced by pulsed latch. A master-slave flip-flop using two latch. All pulsed latches share the pulse generation circuit for the pulsed clock signal. As a result, the area and power consumption of the pulsed latch become almost half of those of the master-slave flip-flop. The pulsed latch is an attractive solution for small area and low power consumption.
The proposed method applied into pipeline architecture. In Fig 8 error occurring in pipeline stage, depends on the stage at which errors occurs, generate error free mode. In Fig.9 error correction in multiple fan-in and fan-out structure using virtual error concept. And also consider loop case in Fig.10, here three types error are introduces before loop, inner loop and outer loop. Both are corrected with small cycle penalty. Table I. Shows that area, power and delay of existing method and Table II also shows that area, power and delay of proposed method. From these two tables we can analyse that proposed method is better than existing method.

**Figure 8:** Pipe line circuit with error correction (pulse gating)

**Figure 9:** Virtual error case

**Figure 10:** Loop case
Table I: Area, Power and Delay of existing method

<table>
<thead>
<tr>
<th>Existing method</th>
<th>Area</th>
<th>Power</th>
<th>Delay</th>
</tr>
</thead>
<tbody>
<tr>
<td>CG</td>
<td>722</td>
<td>192</td>
<td>10.083</td>
</tr>
<tr>
<td>Loop</td>
<td>2150</td>
<td>379</td>
<td>11.937</td>
</tr>
<tr>
<td>Virtual error</td>
<td>686</td>
<td>185</td>
<td>10.083</td>
</tr>
</tbody>
</table>

Table II: Area, Power and Delay of proposed method

<table>
<thead>
<tr>
<th>Proposed method</th>
<th>Area</th>
<th>Power</th>
<th>Delay</th>
</tr>
</thead>
<tbody>
<tr>
<td>PG</td>
<td>420</td>
<td>125</td>
<td>6.033</td>
</tr>
<tr>
<td>Loop</td>
<td>1446</td>
<td>316</td>
<td>11.757</td>
</tr>
<tr>
<td>Virtual error</td>
<td>408</td>
<td>127</td>
<td>5.790</td>
</tr>
</tbody>
</table>

VI. CONCLUSION

Aggressive voltage scaling based on timing speculation has become the most effective way of reducing the power consumption. In order to increase the effectiveness of this form of voltage scaling, it need to reduce the overall timing penalty. This in turn requires the fastest possible mechanism for error correction. So in this paper, proposed a low power area efficient system with fast error correction using new razor approach of replacing flip flop in razor flip flop by pulsed latch. The contribution of new method is one cycle error correction which is capable of reducing overall timing penalty for the correction of large number of errors.

REFERENCES