Arjun Chowdhury, Vandana Sapra, Ankit Khandelwal, Deepak Mahajan (Freescale Semiconductor India Pvt. Ltd.)
Using of basic building blocks in different ways to make complex circuit is a common axiom in Digital Logic design. The complexity of these building blocks can vary from simple structure like synchronizers, multiplexers, adders, FIFO, Glitch-free multiplexer to complex circuits like custom CDC Module, Encoders, decoders etc.. If we talk about these circuits, there exist countless designs, each of them depending on the requirement- any implementation working in one scenario may fail or put limitations in other scenario. This paper intends to discuss some of the commonly used circuits which are faulty in certain scenario and various remedies to make those circuits more robust to increase their acceptability.
Faults in Circuits and their remedy
1. Glitch-Prone Glitch free-Mux
The circuit shown in figure 1 is a commonly used Glitch free mux circuit. Depending upon the frequency ratio and skew between 2 clocks there can be glitch on the output of the glitch free mux while switching from slow clock (CLK0) to fast clock (CLK1).
Figure 1: Faulty Circuit
Due to huge clock ratio between CLK0 and CLK1 there can be a window where both of the CG cell are enabled and actively driving to the final “OR” gate as illustrated in below wave-form, which will produce glitch on final clock output of the glitch free mux. The below snapshot illustrate the generation of glitch in the circuit.
Figure 2: Waveform showing the induced glitch
Some small modifications in the above circuit will increase the robustness hence the acceptability.
Delaying the clock enable signal (FF2.Q and FF4.Q) to other domain ensures that, both the CG cells are not active at any point of time and there are adequate delay between the disablement of one CG cell and enablement of the other. This will help to avoid any glitch on output clock.
Figure 3: Solution 1 circuit
Figure 4: Glitch-free switching with Solution 1
Due to delay element X and Y the clock gating cell CG2 becomes enabled much after CG1 is disabled, hence no glitch will occur as seen in figure 4. The only limitation here is the increased no-clock time duration during clock switching. However that is not a vital parameter for any glitch free mux when used in various common functions.
Instead of CG cell (CG1 and CG2) if simple NOR gate circuit is used to gate and un-gate clocks, the problem disappears. Below circuit and timing diagram delineates the behavior.
Figure 5: Solution 2 circuit
Figure 6: Glitch-free switching with Solution 2
In above circuit O1 always gets disabled long before O2 starts driving clock. The only problem associated to the circuit is the challenge to meet the Clock gating timing check on O1 and O2, which is half cycle with respect to the clock of respective NOR gates.
2. Capture synchronizer not able to capture pulse
Capture synchronizer is well-known basic circuit, used when a pulse from faster clock domain is supposed to be captured in much slower clock domain. Below is one such commonly used capture synchronizer circuit.
Figure 7: Faulty Capture Synchronizer
The input pulse from faster domain is hooked to the input trigger of the above circuit. If the input trigger to this circuit is not sparse enough and occurs more frequently than the destination clock domain, there is possibility of no pulse getting captured at destination clock domain.
Figure 8: Missing pulses
In the above timing diagram, if the trigger pulse is frequent and occurs even number of times within a clock period of destination clock, no pulse will get captured in slower destination clock domain.
The below circuit ensures that if one or more than one input trigger happens within two cycle of destination clock, the output of this circuit will definitely trigger.
Figure 9: Updated circuit Solution
Figure 10: Waveforms with the updated circuit
It is clear from above timing diagram that incase of frequent input trigger the output will always trigger with a maximum frequency which is half of destination clock frequency. However this circuit will not ensure output trigger for each and every input trigger but definitely augments the circuit performance in above case.
3. Fault in Multi Cycle Path(MCP) circuit
In complex and high speed design meeting at-speed timing it-self is a big challenge. Hence it is crucial to identify logics where at-speed throughput is not required. Designer implements some circuit to make those paths Multi Cycle with respect to the faster clock. The circuit shown below in figure 11 explains the behavior.
Figure 11: Faulty Multi Cycle path circuit
In the above example, the data bus traverses from faster domain to slower domain. Although the capturing flops in slower domain is running at the same frequency of the faster clock, but the throughput of data transfer from faster domain to slower domain is not critical hence multi-cycle structure is implemented to ease the timing from faster domain to slower domain.
Figure 12: Waveform showing the correct behavior in RTL simulations
FF1 will have an updated data after the fast domain processes some information. However the data at FF1 remains valid for one cycle of the fast clock and gets updated with next process data after one cycle. The slower domain wants to extend the data for one more clock cycle to ensure the data is stable for two cycle of faster clock. Slower domain provides “wait” indicator to faster domain. Depending on “wait” indicator the flop FF2 which already latched the previous data will ensure data stability of wdata[31:0] bus for one more cycle. The implementation intends to ease timing of Slower domain by making wdata[31:0] multi-cycle(in this case two cycle valid).
Although logically everything looks perfect, the physical aspect of this circuit repudiates the generous intention of making it MCP.
Figure 13: Failure produced in GLS simulations
When the select line of the mux changes from one input to another, due to net delay and cell delay of combinatorial circuit the output of the mux ie WDATA will have unstable data ie glitches in the middle of two cycle stable period. This limitation actually enforces the timing to be met within single cycle of faster clock.
Register the final data at faster domain before sending the same to slower domain which attenuates the glitch.
Figure 14: Updated circuit
Figure 15: Correct behavior in GLS simulations
However, registering the data further will increase the latency of the path which anyway has been identified as non critical. Also the timing from FF1→ FF5 and FF2→FF5 need to be met at maximum frequency. As these flops are localized in same domain it will be easier than the previous solution.
From the above mentioned examples, it is clear that the even the most commonly used circuits may fail in some corner scenarios. The proposed silicon proven circuits provide much robust solution and require minimal changes over existing circuits. Since these are basic building blocks of any SOC, any design issue with respect to these faulty circuits can cause failures at multiple points in the system. Thus, it is important to ensure that such blocks are faultless and work seamlessly in any conditions.