( ESNUG 447 Item 9 ) -------------------------------------------- [09/26/05]
Subject: ( ESNUG 446 #2 ) Magma Responds to the User Critique of its CTS
> If the Magma clock generator tool could actually do what it claimed,
> it would be very useful to a backend designer trying to turn a bad
> design into something useful. Unfortunately the basic design of its
> clock trees is wrong. I have personally designed several clock trees
> so I have thought about the issues quite a bit. The basic flaw in the
> Magma scheme is the concept that you can analyze any geometry and get
> accurate timing numbers.
>
> That was never a good idea which is why people used H trees and the
> problem is getting much worse with current geometries.
>
> - [Uncle Fester]
From: Paul Gittinger <paulg=user domain=magma-da spot calm>
Hi, John,
I am a Director of Product Engineering at Magma, responsible for many
things, among which are Clock Tree Synthesis.
I would not doubt that [Uncle Fester] at one time made similar remarks
about the quality of RTL synthesis when compared to hand drawn gates. I
am not ashamed to admit that I was also on that bandwagon myself in the
mid to late 90's when I was also a full custom logic designer. The truth
is that this is beside the point. The value in automated tools such as
Magma's is that they enable you to do a larger design than you could do
by hand, in a shorter amount of time. Yes, you give up some optimality
in an academic sense, but the vast majority of designs done today do not
require the level of engineering that [Uncle Fester] is talking about.
Magma has actually spent a lot of time analyzing H-tree implementations.
What we have found is that the types of designs people are taping out
with our software do not lend themselves to an H-tree implementation at
all. With the level of clock gating going on in most of today's ASICs,
in some cases multiple stages of clock gating, or muxing, etc., it is
nearly impossible to get any kind of an H-tree solution to actually
work. Moreover, the level of blockages due to ram macros and other types
of obstruction really destroy any opportunities to fit in an H-tree-like
regular structure.
While we agree that in an optimal solution space an H-tree would be the
most regular structure possible, the designs we see on a daily basis
simply do not support their use. In our internal studies of such
designs, our current solution beat an H-tree solution handily in almost
every situation. There will undoubtedly be exceptions, and perhaps these
are the types of designs [Uncle Fester] has the most experience with.
> Magma has already noticed problems in that the new release lets you
> specify several process corners instead of just one. Unless you know
> which process points, and worse, what OCV is going to be in the worst
> places for your clock tree, you can't plug them in.
Magma has, and will continue to pioneer new and more useful ways of
dealing with the multiple process corner problem, OCV, etc. Even when
you have the ability to measure multiple process corners truly
simultaneously, the only method to really fix the skew problem in a
clock tree is to employ some sort of balancing across all levels and
branches of the clock tree so that each level and branch will react
similarly when shifted from one corner to another. I think this is
something [Uncle Fester] would generally agree with, but the difference
is that we maintain you do not "need" an H-tree to achieve this type of
balance; it is possible with other structures as well.
> Magma was talking about ways that you can tune clocks and insertion
> delays relative to each other. I wouldn't let automatic tools loose
> on this kind of a problem. I have done full custom designs where I
> purposly delayed a clock, but those were under very controlled
> situations, like after a multiplier.
This is really an indictment of clock tree synthesis in general. The
idea of "useful skew" is not unique to Magma, and competing tools
support it as well. Of course many designs either don't need this, or
designers do not want to employ it as a technique to close timing. In
this case, executing "fix clock -weight skew" will prevent any useful
skew from occurring. In the case where the designer may decide that an
additional adjustment of about 1 buffer delay is appropriate to help
meet timing, he can specify useful skew to be used up to that limit
only, using "force plan clock -max_useful_skew <value>. In other cases
due to internal clock latencies of a macro, there is literally no other
way to close timing other than to skew a clock endpoint significantly
early. We support multiple uses, either letting the tool figure this out
with "run timing adjust skew" or letting the user specify targeted skew
with "force timing latency -type skew" on the desired pins. There is
actually a lot of flexibility here.
> When I tried to ask the Magma people about how they derived their
> delay numbers I was given the "let me put you in touch with the
> salesman" run around. Their demo showed single numbers for delays so
> unless you guess the right process point and switching conditions to
> input into their analysis, it can't possibly generate the worst
> numbers.
As for how we derive our delay numbers, it's done the same way that
every other piece of our products do it... using our integrated STA and
extraction. So, CTS uses the same STA and extraction the entire
implementation flow from RTL2GDSII, which is one of the chief value
propositions of our tool.
As for the level of detail shown in the demo, please remember it was a
demo. A one to two hour session is meant to be an overview, not a
detailed training class. There is no way to show everything in this
timeframe.
> Of course there is the other problem that the best way to analyze this
> is some kind of SPICE -- not any sort of RC.
Of course the best (if by best, you mean most accurate) way to analyze a
circuit is to SPICE it. We currently support methods of extracting all
or parts of clock networks into SPICE netlists to support additional
analysis such as this using "export spice netlist" or "export spice
path", but it is impractical to assume that this level of analysis can
be done in our, or anyone else's, automated CTS solution during the
actual construction. These are verification signoff steps which are much
more runtime intensive, and on many larger chips perhaps prohibitively so.
Thanks for allowing me to express Magma's viewpoint on these issues raised
by [Uncle Fester]. I hope it has been useful for everyone.
- Paul Gittinger
Magma Austin, TX
---- ---- ---- ---- ---- ---- ----
From: Jeff Echtenkamp <echtenka=user domain=broadcom spot calm>
Hi, John,
I'm curious to hear the background of the author of this post. I'm
guessing they are coming from some sort of custom background, not an
ASIC background. If they argue "I wouldn't let automatic tools loose on
this kind of a problem (clock balancing)", it seems to imply that maybe
their beefs aren't with Magma, but with P&R in general.
Could the author please let us know his current methodology, so we can
see if he is comparing it to SOCE, IC Compiler, Astro, PC, some internal
tool, or hand designing? It would be helpful to see what other people
are doing versus Magma's implementation.
- Jeff Echtenkamp
Broadcom Irvine, CA
Index
Next->Item
|
|