( ESNUG 281 Item 6 ) ------------------------------------------------ [2/19/98]
Subject: (ESNUG 278 #9) Synch Probs W/ 2 Clock Domains W/ Close Freqs
> In one of our designs we are synchronizing a fifo across 2 clock domains.
> The write to the fifo is through one clock & read is through another clock.
> We use gray coding and 3 stages of flip-flop to synchronise the write
> and read pointers to the fifo. The problem we see in the lab is when
> the 2 oscillators have the same frequency some data is getting lost. When
> the 2 clocks are different frequencies we don't see any problem.
>
> We are suspecting that it is because of meta-stability caused by using 2
> clocks of similar frequencies. The problem cannot be reproduced in
> simulations. The synchronizing circuit looks something like this.
>
> _______ ______ _______
> in1 --->|D Q|---------->|D Q|---------->|D Q |---> out1
> | | | | | | (synchronized)
> wr_clk --|>______| rd_clk --|>_____| rd_clk --|>______|
>
>
> We have similar synchronizing circuits in our earlier ASICs. Any ideas
> as what exactly is happening?
>
> - Shashi Aluru
> FORE Systems, Inc
From: Jim Dahlberg <jad42958@fulmar.cdev.com>
John,
In the resync circuit posted to ESNUG, if wr_clk is shorter than rd_clk,
AND the output of the F/F being clocked by wr_clk is only one wr_clk long,
then it is possible for the rd_clk circuit to miss the wr_clk signal. In
other words, in order to resync a signal, the sample rate must be faster
than the signal you are trying to resync.
- Jim Dahlberg
General Dynamics Information Systems
---- ---- ---- ---- ---- ---- ----
From: "Ross Swanson" <swan000@erols.com>
Let me and hundreds of others say, "add delay's between the D flip flops to
prevent Q to D race with the clocks", because of your induced skew between
(or within) wr_clk and rd_clk domains.
- Ross Swanson
---- ---- ---- ---- ---- ---- ----
From: sgolson@trilobyte.com (Steve Golson)
John,
Just because this designer has two flops running on the destination clock
(rd_clk in his diagram) doesn't make metastability problems go away. The
first flop does all the work of synchronization. The second flop just
ensures that you wait "long enough" for the first flop to stabilize before
you look at its output. One clock period may not be "long enough".
Have you asked your ASIC/library vendor about the metastability
characteristics of their flops? What frequency are you running at?
What is the normal CLK->Q delay of your flops? When you run at a lower
frequency, does the problem go away?
An alternative to gray coding is to send a single bit "flag" indicating
that the input bus has changed. Then only this "flag" signal needs to
be synchronized.
The safest way is to have bi-directional synchronization. Imagine that
signal out1 gets sent back across to the wr_clk domain (with proper
synchronization, of course). Then only allow in1 to change when you
know that the previous change has been received at the other end.
- Steve Golson
Trilobyte Systems
---- ---- ---- ---- ---- ---- ----
From: Kelly Fromm <kellyf@packetengines.com>
John,
A lot of us have been bitten by similar problems at one time or another.
As your oscillators approach same frequencies, period variation and duty
cycle jitter begins to affect your synchronizer. Two independent
oscillators, will vary in frequency and clock pulse width within their
spec'd tolerance, and this is enough to make your synchronizer not work
with real logic. Simulation will not find it unless you model your
clocks for it (+/- 0.5ns variation of both clock periods will generally
find this). The double flops on rd_clk in your example provide the
required metastability coverage, but as the two clock frequencies
approach each other your real logic begins to violate the Nyquist
sampling rule. (Remember that one?) I break synchronizers into three
separate areas: (1) Sampling criteria (2) metastability and (3) Other
Circuit considerations. If you don't handle the first, the other two
don't matter. In your case, I am assuming the gray code provides a
single pulse 1/2 the frequency of wr_clk. Sampling requires that rd_clk
maintain greater than 2X the data frequency to reliably capture the
input. Exact relationships require that you understand the wr_clk FF's
clk-Q timing and the rd_clk FF's hold time requirements, but in general,
if you can't insure that your rd_clk period is 2-3ns less than your
wr_clk period, you need to pulse stretch in the wr_clk domain. I believe
this is what is missing in your circuit. The two rd_clk ff's you have
cover the second part, metastability. I think of these as the sample
and caputure pair. Sample is unreliable, but Capture is valid. Once the
sampling and metastability are covered, you need to consider the effects
of the wr_clk domain's pulse stretching logic when rd_clk runs at higher
frequencies. Perhaps you need to add a pulse reducer to the other side
of the capture FF. Keep the three areas independent, and you seldom
run into problems. As to your earlier ASICs, I would suspect that for
some reason the sampling rule wasn't violated. We need a bit of luck
once in a while too...
- Kelly Fromm
Packet Engines
---- ---- ---- ---- ---- ---- ----
From: "Steven Murphy" <steven.a.murphy@lmco.com>
John,
It really depends on what the rest of the circuit is expecting. In the
presence of meta-stability or clock jitter even though the input write
counter is a gray code the output won't be a gray code. However, this is
also true when the clock frequencies are different, but there may be a
hole in your pipeline logic when the clocks are similar.
I suggest doing a simulation where the clock frequencies and phases are
the same and there is a large amount of jitter on the read (or write)
clock. You will have to write a clock process that injects jitter. This
will simulate a meta-stable condition where the output can jump around.
You may want to have the clock frequencies be just slightly different so
that the phases will slip by each other and then the jitter will
sometimes cause a jump and sometimes not cause a jump in the output. You
might also try random jitter. Run the simulation a long time to make
sure all possible data relationships are hit by the random jitter.
A few things to remember: As I said above, don't expect the output to be
a continuous gray code. Be sure to delay the read pointer 2 stages
before comparing it to the synchronized write pointer so that you are
comparing the read and write pointer at the same time instance. If you
don't match the delays in the read and write pointer then you would be
comparing an old write pointer with a new read pointer. When I say 2
stages I am assuming that the register shown above that is clocked by
wr_clk is the register that is actually producing the write pointer. If
it is an additional delay on the write pointer then the read pointer
would have to be delayed an additional clock before the comparison is
made. It really depends on what you are trying to do in the end as to if
these delays are necessary.
Have you done the meta-stability calculations for the ASIC process you
are currently using? Have you allowed enough walk-out time?
- Steven Murphy
Lockheed-Martin
---- ---- ---- ---- ---- ---- ----
From: "Bruce Nepple" <brucen@imagenation.com>
John,
The circuit drawn above is certainly suseptable to errors caused by
meta-stable operation of the middle flipflop (call it D2, Q2). One cause of
Meta-stable operation is violation of the setup or hold specification at D2.
A small meta-stable window exists within the setup and hold window that will
cause the output of the flipflop to be unstable. It could oscillate, or
change states twice, or just take a long time to reach a stable value.
There is no doubt that you will get metastable operation with asynchronous
clocks. The question is how often.
The solution to metastable operation is wait a while and re-synchronize with
the same clock (as you are doing). Theoretically the oscillation can take
infinite time to stabilize, but the probability of that is infinitely small.
The longer you wait, the safer you are. If you wait long enough that the
probability of failing is less than the probability of your part failing,
then that's probably long enough. There are several good references and
past ESNUG articles that discuss this process. My favorite reference (most
practical) is:
Chaney, Thomas J. "A Guide to More Reliable Synchronizer Designs "
Technical Memorandom No. 207, January 1974 Computer Systems
Laboratory, Washington University, St.Louis, Mo. 63110
A more rigorus treatment(not so practical, for me anyway) is:
Marino, Leonard R. "General Theory of Metastable Operation"
IEEE transactions on Computers Vol C-30, No 2 Feb 1981
(good list of references though)
So, where this takes me is if you have metastability induced errors, the
time allowed to resample (rdclk period) is critical (probaility is
exponential).
If you slow down your read clock (and your write clock with it) and
the problem goes away, then it's probably due to metastability. If
you speed up both clocks, it should get much worse.
Settling time, ground bounce, crosstalk, etc can all cause what
appears to be setup and hold violations or clock violations which
cause metastable operation (clock glitches, for example).
- Bruce Nepple
Imagenation Corp. Portland, Oregon
|
|