Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS


( ESNUG 411 Item 1 ) -------------------------------------------- [04/23/03]

Subject: ( ESNUG 410 #2 ) Magma BlastFusion, Useful Skew, Scan, and OCV

> Using the tool in Magma, it gives excellent results, yet, from discussions
> with other Magma users, I know that some users disable the feature!  I
> believe they disable it because they really don't understand the concept;
> or they have learned how to minimize clock skew and they don't want that
> hard-earned skill to be redundant.
>
>     - Simon Matthews
>       Paxonet                                    Fremont, CA


From: Howard Landman <howard=user domain=riverrock.org>

Hi, John,

You can count me as a recent convert to useful skew.  When I first 
encountered the practice, it gave me the shivers because it added a 
whole set of unwelcome complications to the timing analysis, and the 
engineers doing it were (I thought at the time) treating this a little 
bit cavalierly.

                A                      B                      C
             -------                -------                -------
          ---|D   Q|-- slow logic --|D   Q|-- fast logic --|D   Q|---
             |     |                |     |                |     |
             |     |                |     |                |     |
   clock ----|>    |   --|>o-|>o----|>    |            ----|>    |
           | -------   |            -------           |    -------
           |           |                              |
           --------------------------------------------


However the EDA support for useful skew has gotten much better in recent 
years, and I now believe that it can and should be used to improve 
timing.  I've even seen cases where Magma BlastFusion has successfully 
laid out and met timing on netlists with "impossible" timing constraints,
such as an input delay longer than the clock cycle time.  (Hey, no problem,
just skew the receiving FF a little - as long as you don't create a hold
time problem!)  This rather startled me the first time I noticed it, but
I'm getting used to it now.  Probably in a couple more years I won't be
able to tolerate a tool that CAN'T do it.

I could still see disabling the feature if I was using a clock distribution
strategy, like a grid, which was inherently imppossible to tweak that way.
But if I'm letting Magma do the clocks, there's no obvious reason to cripple
the BlastFusion tool.

Simon also wondered:

  "One person though did express the idea that useful skew could make
   on-chip timing variations worse.  Do you have any thoughts on this?"

to which Jack Fishburn replied:

  "There's no reason that a useful-skew configuration will have more, or
   less, timing variation than a zero-skew configuration."

I have to disagree slightly with Jack here.  If the useful skew is
created by adding more delay, such that the insertion delay of the clock
is increased, then one should expect that the OCV on that path with also
increase (possibly linearly with the delay, or possibly that divided by 
the square root of the number of stages, depending on the exact nature 
of the OCV source - see my earlier ESNUG 409 #1 post about this).

    - Howard A. Landman
      Riverrock Consulting                       Fort Collins, CO

         ----    ----    ----    ----    ----    ----   ----

From: Thomas Moehring <thomas.moehring=person company=infineon jot yon>

Hello John, 

My company is using Magma's BlastFusion tool which supports useful skew.
It can add significantly to the timing optimization capabilities, e.g. to
automatically create early or late clocks for I/O registers, or, as in
Simon's example, steal some time from the fast path to relax the required
time of the slow path.

On the other hand, with full scan path design as we normally do, useful
skew can become very painful.  Depending on the architecture of the scan
flipflops, any local clock skew between two adjacent flipflops in the
scan chain can cause, or increase, a hold violation, which will require
delay buffers for fixing.  Applying useful skew extensively can result in
tons of delay buffers.  In that case you may decide to switch off useful
skew, or at least restrict it to some 50 or 100 psec.

My approach using Magma BlastFusion is: 

  - insert the initial clock buffer tree ("run route clock ...")
  - analyze timing violations, both setup and hold
  - per clock, decide on the best strategy for clock tree optimization
    ("run gate clock ..."), minimize skew, or meet target latency, or
    allow for useful skew, or none

Yes, sometimes no optimization is the best.  Happy balancing!

    - Thomas Moehring
      Infineon                                   Munich, Germany

         ----    ----    ----    ----    ----    ----   ----

From: Jon Stahl <jstahl=user company=avici aught prawn>

Hi, John,

Magma's use of useful skew does have the potential to make on-chip variation
worse, but mostly in comparison to a balanced clock tree (BCT) structure.

As an example, I taped out a chip in 2000 in 0.25 um applying "useful" skew
(Not with Magma, but with the former Ultima ClockWise tool).  I estimated
that this approach saved us up to ~-1nsec max slack and thousands of nsecs
of TNS, and certainly months off of our timing closure schedule.

At that time, zero-skew clock trees were typically built using a "balanced
clock" tree approach.  This was popular because of it's simplicity, ease of
implementation, and the fact that most former EDA tools were incapable of
the timing accuracy required to construct a zero-skew pure buffer tree.

In the BCT paradigm, the number of buffers from the root of the tree to any
leaf is constant. Further, BCTs were nominally built with a huge buffer at
the tree root driving large wire(s), perhaps big buffers at the first level,
and then smaller buffers at subsequent levels.  Thus the depth of the tree
was shallow, and the clock path divergence small -- not only in number of
different levels, but in the number of cut points at which any two flops
were on different branches.

Downsides included, adding balance loads, pre-planning the clock wires and
perhaps root buffer (if an IO slot driver), power consumption, and EM issues
due to high current drive cells.

In 2000, the majority of ASIC vendors did not perform OCV analysis, both
because the processes didn't require it, and the commercial EDA tools
didn't support it.

These days the modern processes necessitate OCV constraints on the order
of 10-12%, and most tools support the analysis.  Further, the tools are
now capable of building zero-skew pure buffer trees.  And with bigger
designs with many more clock leafs, the depth of the tree and magnitude
of divergence is very large.

On a recent 0.18 um design with ~60K flops and a worst case 5 nsec insertion
delay, taking 10% OCV into account we had the potential for up to 0.5 nsec
of additional constraint.

Would implementing useful skew have made this worse?

Perhaps, but if was assume 1 nsec of extra skew and complete divergence,
we only lose back 100 picosec (1 nsec * 10%).  So it's a net win, unless
the variation happens to hit you on a cut point for an orthogonal tight
path where intentional skew can't provide additional margin.  A very low
probability scenario I would think.

    - Jon Stahl
      Avici Systems                              N. Billerica, MA

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)