> Here's how the business cards will be counted:
>
> Each unique Synopsys R&D, CAE, FAE card 2.0 points
> Synopsys Hotline, or Instructors: 1.0 points
> Synopsys Sales or Marketing: 0.2 points
> VPs, CEOs, COOs, General Managers: 0.5 points
"John, I was pleased that Marketing cards still warrant positive points
in your contest! We haven't met yet, and I want to fix that. I'm at
HDL-con this week and will try to find you."
- Steve Brown, Synopsys Marketing
"John, Given that I've got FAE's, R&D people, CAE's, Consultants, and
of course Customers routinely taking classes I teach, I think that
you're undervaluing both instructors and Synopsys Customer Education
in your point scheme."
- Kristen McNall, Synopsys Instructor (Contract)
"Half a point for any Synopsys manager is half a point too much as far
as I'm concerned. Anon Please."
- an anon Synopsys employee
"Here's four of my business cards, John. There. Now I'm 2 points like
my R&D engineers."
- Aart de Geus, CEO of Synopsys
"What, no points for cards from members of the Legal department? We'll
be at SNUG to keep an eye on you."
- Steve Shevick, a lawyer at Synopsys who's now the CFO
"Is this contest open to departed SNPS employees? My Rolodex bulges
with SNPS business cards. :-)"
- Ken Rousseau of Virage Logic
"Regarding Cash your for cards: is this like Pokemon? Are first edition
(old Synopsys Logo) cards more valuable? What about a Raul Camposano
card? (R&D + 2, CTO + .5, VP + .5, GM + .5 = a total of 3 points?)
Will you take photocopies or do they have to be originals?"
- Dennis Kelly of Sun Microsystems
[ Editor's Note: The point system I worked out isn't additive, Dennis,
it was designed to be multiplicitive. An R&D guy is worth 2 points.
An R&D manager is worth 2 x 0.5 = 1 points. To me, this best measures
that individual's real impact on me as a Synopsys customer. This makes
Raul worth 2 x 0.5 x 0.5 x 0.5 = 0.25 points; barely better than a front
line Synopsys salesman or marketdroid. Sorry, Raul... (And I won't
say what 4 Aart de Geus cards mathematically works out to.) - John ]
( ESNUG 349 Subjects ) ------------------------------------------- [4/18/00]
Item 1 : WARNING: Using transform_csa w/ DW Multipliers Creates Bad Logic!
Item 2 : Older, Cheaper VCS Versions Benchmarked *Faster* Than New VCSi !!
Item 3 : ( ESNUG 348 #6 ) Hey, Avanti Mars XTalk Is Crap On Cross-Cap, Too
Item 4 : ( ESNUG 348 #1 ) Two DC Bugs That Cause Bad Logic To Be Created
Item 5 : Lattice, AMD/VANTIS, Cadence Buying OrCad, ViewLogic/Summit Merger
Item 6 : Testing The Waters; RTL Signoff Still Needs Gate Level Simulation
Item 7 : ( ESNUG 348 #13 ) ... And ModelSim Can't Read The Tcl It Writes!
Item 8 : Customer Pissed That Synopsys Recommended Custom Wire Load Models
Item 9 : ( ESNUG 343 #10 ) Fun Stuff We Found Analyzing Our .lib Files
Item 10 : Methodology Details On The EmpowerTel MIPS PKS/Ambit RTL Tape-out
Item 11 : ( ESNUG 348 #6 ) Cadence's Silicon Ensemble & Cross-Cap At 0.18um
Item 12 : ( ESNUG 348 #11 ) report_power Memory Leak A Power Compiler Issue
Item 13 : ( ESNUG 348 #12 ) Why DC & PrimeTime Can't Read Their Own SDF
The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com
( ESNUG 349 Item 1 ) --------------------------------------------- [4/18/00]
From: [ The Great Gatsby ]
Subject: WARNING: Using transform_csa w/ DW Multipliers Creates Bad Logic!
Hi John,
Here's another Bad Logic bug we found in Design Compiler with the Synopsys
workaround included below:
PROBLEM:
When the transform_csa command is used on designs containing multipliers,
incorrect logic might be produced in the output netlist. You will
experience this problem with DesignWare Foundation only in version 98.08
EST or later.
SOLUTION:
In the 98.08 EST release and beyond, DesignWare Foundation has two
architectures for DW02_multp (partial product multiplier): wall and nbw.
The existence of more than one architecture causes a problem in the tool.
The workaround is to set only one of the previous architectures (wall or
nbw) before using transform_csa.
For example:
set_dont_use dw02.sldb/DW02_multp/wall
transform_csa
Because there are no issues with DW02_multp architectures, you can use
either nbw or wall depending on your design needs.
This, too, bit us while taping out our 1.3 million gate design. Hopefully,
by bringing this up in ESNUG, other users won't have to discover this bug
the hard way (like we did.) Please keep me anon.
- [ The Great Gatsby ]
( ESNUG 349 Item 2 ) --------------------------------------------- [4/18/00]
From: Erik Jessen <ejessen@vixel.com>
Subject: Older, Cheaper VCS Versions Benchmarked *Faster* Than New VCSi !!
Hi, John,
We use Windows NT with VCSi version 4.1.1. I did some benchmarking with
VCS and VCSi 5.1 (which is supposed to be a LOT faster than 4.1.1). My
results on NT were:
- VCSi 5.1 (with radiant-2) was 3X *SLOWER* than VCSi 4.1.1
- VCS 5.1 (with radiant-2) was 30% faster than VCSi 4.1.1
(but costs 2x VCSi).
I spent several weeks running tests in coordination with the Synopsys.
Finally, the VCS FAE found out what happened. Back when Viewlogic released
VCSi 4.1.1, they forgot to turn off all the optimization algorithms, so VCSi
4.1.1 was really like VCS 4.X. All subsequent releases of VCSi have had
those optimizations turned off, which is why VCSi 5.1, even with max
optimization turned on, is so much slower. :)
Another useful tidbit: We normally use SignalScan to dump out a waveform
database, and then view it. I tested out VCS' new one (that doesn't use
PLI); I saw a 7X speedup over using SignalScan/PLI.
That's pretty significant!
I've also been toying with the idea of writing to you about how good the
VCS support has been for us. They've made handling these problems a
breeze for us and I'd like to publically acknowledge that. I'm at a small
startup, so it's not like we're a big account that has to be taken care of,
yet we're treated like kings by these guys! Great support!
- Erik Jessen
Vixel Irvine, CA
( ESNUG 349 Item 3 ) --------------------------------------------- [4/18/00]
Subject: ( ESNUG 348 #6 ) Hey, Avanti Mars XTalk Is Crap On Cross-Cap, Too
> So now we know we've got a cross-capacitance problem, and we know we can't
> just blindly jam buffers in to fix it, so what do we do? We go check out
> the signal integrity option on Cadence Silicon Ensemble.
> ...
> 2. Its flagging over 1000 noise violations in a few hundred K gates
> of logic. Over 1K noise violations in < 9mm^^2 of randomly
> routed die? Gimme a break! Simulation showed that SE was
> over-estimating noise by > 100%. It was using a slew rate
> less than half of the actual slew rate. Its hard to say how many
> real noise violations are in there, but my simulations are saying
> my usual layout topologies ought to give me < 10 noise violations
> per 100K gates *usually*. Not 1000!
> ...
>
> All in all, I feel a lot of bad silicon coming on...
>
> - [ Born To Run ]
From: [ Born To Lose ]
John, please keep me anon.
So [ Born To Run ] thinks Cadence's Silicon Ensemble is clueless on
cross-cap? Just tell him to be glad he's not trying to run Avanti's Mars
XTalk Router. Mars XTalk also has a rinky-dink noise model that comes up
with over a thousand violations for a few hundred K gates. But then it
rips them up and tries to re-route them. Oh yeah, that's gonna close.
Not!
Between what I've seen of Avanti on cross-cap and what I read in his letter
about Cadence, it really makes me wonder how good these guys are at deep
sub-micron design. I mean, if I can run a noise simulation and find out
they're over-reporting noise, why can't they figure that out? Did they not
run the simulations, or did they set them up wrong? Both options are a
little scarey.
Anybody out there seen anything working on cross-cap? I hear Frequency's
got some Copernicus thing that's supposed to handle cross cap. Does it
work? Does it take forever? Any stories anyone can share on ESNUG?
- [ Born To Lose ]
( ESNUG 349 Item 4 ) --------------------------------------------- [4/18/00]
Subject: ( ESNUG 348 #1 ) Two DC Bugs That Cause Bad Logic To Be Created
> We are right in the middle of taping out a 1.3 million gate design and
> we discovered that incremental compiles in DC 99.10 is synthesizing bad
> functionality in our design! Yes, I am saying that incremental compiles
> in DC 99.10 is creating broken netlists. The problem involves something
> messy with boundry optimization.
>
> - [ The Great Gatsby ]
From: Paul Gerlach <paulge@mdhost.cse.tek.com>
Hi, John,
Has anyone had problems or received notice from Synopsys that:
simplify_constants -boundary_optimization
is similarly flawed? We use this command to remove many dangling gates that
compile seems to leave around.
- Paul Gerlach
Tektronix Beaverton, OR
---- ---- ---- ---- ---- ---- ----
From: Longyin Wei <lwei@sdd.hp.com>
Hi John,
Do he mean DC 99.10 or DC 99.10-4 here? I was trying to use DC 99.10 once,
and the problem I had was that DC 99.10 hangs while building some blocks
with map_effort high. So I had to back down to DC 99.05-2 instead. I
haven't tried the DC 99.10-4 yet. I am ready to try DC 99.10-4 in next
few weeks. But if what he meant about is DC 99.10-4, I'll wait. Which
is it?
- Longyin Wei
Hewlett Packard
---- ---- ---- ---- ---- ---- ----
From: [ A Synopsys DC Technical Marketeer ]
Hi, John,
What [ The Great Gatsby ] has reported is what we at Synopsys call a Class 4
bug. These are bugs that we stop everything for because they involve DC
synthesizing bad logic. I work in Technical Marketing for DC and since
ESNUG 348 went out, I've received a lot of calls on this issue. Enclosed is
the write-up of the research I've done on these bugs. Ten Synopsys people
(six of whom who are in R&D) have been involved in making this write-up as
detailed and accurate as possible. I apologize for the inconvenience for
Synopsys customers on this issue.
There are actually 2 related STARs which have similar symptoms and effects.
STAR 100465 was filed by [ The Great Gatsby ] but STAR 99006 is related to
this matter as well. Bad logic can be created when boundary optimizations
are performed on DW parts AND the implementation of DW part changes as
part of optimization performed during successive compiles AND some other
complex conditions are present.
You can not generalize the issue by stating that it is boundary optimization
on an incremental compile. This would imply a command such as:
compile -incr -boundary_optimization
or
set_boundary_optimization true
causes this problem. NOT TRUE!!!!
Some boundary optimizations are performed on all DesignWare parts REGARDLESS
of the setting of the switch. Boundaries around inferred DW parts are
"soft". It is not incremental compiles which cause problems; but the
interaction between ANY boundary optimization and the optimization which
changes DW implementations. This can happen on ANY compile, incremental or
not, in which implementations actually get changed. Pay careful attention
to the conditions which _may_ cause bad logic.
Symptoms
--------
The symptoms may seem to apply to many designers but the actual probability
of bad logic is very low. We've seen bad logic only in very, very few real
life cases. There is also a solution that avoids the problem completely.
STAR 100465 symptoms:
- The design MUST contain DW parts at some point during synthesis.
- The DW parts must have equal or opposite inputs. Example:
Equal Inputs Opposite Inputs
-------- --------
-----------| DW | --------------| DW |
| | part | | | part |
-------| | ---|>o----| |
-------- --------
- The implementation is changed during optimization.
STAR 99006 symptoms:
- The design MUST contain DW parts at some point during synthesis.
- boundary optimization must be on via "compile -boundary_optimization"
OR via "set_boundary_optimization true" on the design.
- The DW implementation must change on a secondary or later compile.
- You're using Design Compiler version 99.05 or earlier.
The Cures
---------
STAR 100465 cure:
BEFORE your FIRST compile set
compile_implementation_selection = "false"
This flag should also be set to "false" before any successive compiles.
STAR 100465 will be fixed in 2000.05
STAR 99006 cure(s):
You have a choice of 3 cures for STAR 99006:
On any secondary compile set
compile_implementation_selection = "false"
OR
perform the secondary compile without invoking
boundary_optimization
OR
upgrade to Design Compiler 99.10 or later.
STAR 99006 has been fixed in 99.10
Side-Effects of the Cure(s)
---------------------------
The side-effect of not using the boundry optimization technique is lowered
quality of synthesis results. You design won't have broken logic, but it
also won't be the most optimal logic you could have created. In the case
where
compile_implementation_selection = "false"
DC will not perform a specific optimization, Incremental Implementation
Selection (IIS).
Although IIS can be a very powerful optimization for a few users, in most
cases it should not have a significant impact on design performance. Some
power users use this flag to improve their run-time at the expense of
circuit performance. Again the performance degradation varies by design
and should not even be noticeable for most customers.
If your circuit shows significant performance degradation after switching
off IIS optimization then you may want to try the antidote.
Antidote to the Side-Effects
----------------------------
Manually set the implementation of the DW part using "set_implementation".
Once you have identified the DW parts whose implementations need to be
different, use the following steps to report the existing implementations
and then change it during a subsequent compile.
/* Identify the DW parts & implementations */
report_resources
/* force DW instance U1 to "carry-look-ahead" */
set_implementation cla U1
/* compile design to force implementations */
compile [-incr]
/* verify that selected implementation was used */
report_resources
If you are not sure which implementations are available for a particular
designware part then run the following command on your designware library.
report_synlib
Hopefully this will help the ESNUG readers avoid these two Class 4 bugs in
Design Compiler.
- [ A Synopsys DC Technical Marketeer ]
( ESNUG 349 Item 5 ) --------------------------------------------- [4/18/00]
From: Roy Kelsey <aroy@ix.netcom.com>
Subject: Lattice, AMD/VANTIS, Cadence Buying OrCad, ViewLogic/Summit Merger
Hi John,
I am doing some fairly high speed CPLD design (~50-100 MHz) and went looking
for some tools. The company has historically used AMD MACH devices in most
of their designs. Lattice has acquired the AMD/VANTIS product line. The
tools they used around here haven't been updated for a number of years. I
have been doing my initial design with Lattice tools but don't find them
especially well integrated.
The Schematic capture package used has been Orcad. Orcad used to have
something called Design Express, now that they have been acquired by Cadence
it is no longer offered. I haven't heard anyone singing praises about
Cadence's support for small to mid-sized companies, and indeed they haven't
been going out of their way to make me happy!
I am currently evaluating the ViewLogic software but am on a short string
because they want to make an end-of-quarter sale and are offering a nice
discount. The stuff seems nice enough though they sure cut it into a bunch
of little pieces! Several of the engineers here want to go for it -- they
have some experience with ViewLogic. Should we?
Before I jump off the deep end and endorse spending a significant portion of
the engineering budget, I'd like to get an opinion from you as to what you
think of the recently announced merger between ViewLogic & Summit.
- Roy Kelsey
( ESNUG 349 Item 6 ) --------------------------------------------- [4/18/00]
Subject: Testing The Waters; RTL Signoff Still Needs Gate Level Simulation
> "People are missing the point here. We need levels of abstraction to
> be removed -- not added! In the not too distant future we'll need
> RTL-level sign-off and we'll also need good RTL-level power, area, and
> timing estimation tools."
>
> - Steve Golson of Trilobyte Design
>
> "We already do RTL sign-off today. It doesn't make sense that every
> designer knows synthesis details. We just have one engineer in our
> group that does that and the rest of us write the Verilog RTL."
>
> - Paul Zimmer of Cisco Systems
From: Frank Emnett <frank@aiec.com>
John,
In talking with other designers at the SNUG conference and elsewhere, it
seems like many folks run what is almost an RTL signoff flow in house, but
they all seem to run some level of gate-level simulation for a sanity check.
Something I haven't really seen discussed (would make a great SNUG paper
for someone with this knowledge) is why do we still need to run gate-level
simulations? What problems do they uncover?
Personally, I've caught the situation recently discussed in ESNUG where
uninitialized registers driving if-then-else statements simulate one way
in RTL simulation, as if the registers were actually initialized to some
good value, and in gate-level unknowns propagate all over the place. I've
also seen situations where assumptions were made regarding scan insertion
and ATPG that aren't necessarily true (set_scan_transparent,
set_test_assume), so some set of ATPG vectors need to be run at the gate
level. Equvalence checking won't help with these issues. Are there others?
I think that the issue is not so much the time needed for gate level
simulation, since usually a very limited subset of the full regression suite
is run at gate level, but rather is the late point at the design cycle in
which problems are uncovered by gate level simulation causing some sort of
scramble to fix them at the last minute.
Do you folks try to run gate-level sims on some non-optimized quickie
synthesized netlist earlier in the flow? Or is there any way to completely
eliminate the need for these gate level sims, through adherence to certain
design practices? Are there any RTL analyzers that can help with this?
- Frank Emnett
Automotive IEC Phoenix, AZ
( ESNUG 349 Item 7 ) --------------------------------------------- [4/18/00]
Subject: ( ESNUG 348 #13 ) ... And ModelSim Can't Read The Tcl It Writes!
> Anonymous, please. I've been using ModelSim (PE, V5.3a), and discovered
> one interesting "feature". If you have a waveform display up on the
> screen, and want to save the format, it writes a Tcl script for you to
> execute later. So far so good.
>
> However, if your display includes large virtual signals (eg busses that
> you have built on the display by combining individual signals), the Tcl
> script is not executable when reloaded. It writes the bus description
> all on one line, with no regard to the buffer length (I've seen lines
> almost 5000 characters long)! These, naturally, break the ModelSim Tcl
> interpreter when read back in.
>
> I do feel it's rather poor for a tool to write a script that it itself
> cannot read.
>
> - [ The Cat In The Hat ]
Dear John,
I've noticed the same problem w/ ModelSim EE V5.3c. It's really a shame!
In my particular version of ModelSim, the only real problem with the
generated Tcl script is a few misplaced spaces. By removing the excess
spaces, I can get ModelSim to read the script back in again.
$ perl -pi -e 's/ , /,/g;s/ }}/}}/' tcl_script
This little trick has saved me many, many hours of hair pulling!
- David Sawey
Fujitsu Richardson, TX
( ESNUG 349 Item 8 ) --------------------------------------------- [4/18/00]
Subject: Customer Pissed That Synopsys Recommended Custom Wire Load Models
> THE BIRDS & THE BEES: Currently, to manage the funky effects of hanging
> out around 0.25 um, chip designers use Design Compiler in conjunction with
> Wire Load Models. They're also using a class of tools which guesstimate
> physical effects (within 15 percent or so) that are best thought of as
> "planners". Their user stats:
>
> Avanti Planet-RTL ######## 17%
> Synopsys Chip Architect # 3%
> Synopsys FlexRoute 1%
>
> Once you're past the physical effects planning stage, there's another
> class of tools that manages physical effects via post-synthesis
> optimizations. Their SNUG'00 tool survey stats:
>
> Synopsys Floorplan Manager ########## 21%
> Avanti Saturn ###### 12%
> Cadence Phys Design Planner ## 5%
>
> Cadence PBopt should also be included here for completeness even though it
> wasn't in the survey. Anyway, the big problem is these Wire Load Model
> tools/approaches only work so-so, but they're as good as it gets for now.
From: [ BeetleJuice ]
John,
I'm pissed, but I must have no name here.
I have been consistently flabergasted at Synopsys's party line recommending
the use of Custom Wire Load Models. I see it all over their official DC
documentation and the Synopsys guys were talking about it at SNUG'00. The
idea is intellectually appealing, but the fact is they don't work and make
things worse. Considering that the wire load construction challenge is to
approximate a Poisson distribution with a single number, no matter what
number is chosen, it is going to be wrong.
The premise of Custom Wire Load models is that by mathematically removing
the highest and lowest values of the distribution (-trim) and drawing an
arbitrary line at a pre-determined percentage (-percentile), smoothing the
resulting table of numbers (-smooth), the resulting table will have far
greater predictive ability than the obviously conservative, sandbagged,
overly pessimistic tables supplied by those lowliest of leaches: the
semiconductor vendors.
As was pointed out by an astute user at the PhysOpt tutorial at Snug 2000
in San Jose, the values of the -trim and -percentile have more effect on
the value of the generated tables than the actual data set.
The default values of 10% trim and 50% percentile feed the delusion that
Vendor WLMs are overly conservative, because these unachievable numbers
make the design look much faster, and cannot be realized. A small detail,
usually discovered by the design team, in the 11th hour, under tremendous
time to market pressures.
Furthermore, as also pointed out at the PhysOpt tutorial, the tables for
custom WLMs are not populated above fanouts=19. Gee, what do you think
that happens on the clock, reset, scan, busses and any other nets that
might have fanouts greater than that?
To anyone contemplating the use of Custom WLMs, spend a bit of time
following the technique described by Steve Golson in his excellent paper at
last year's San Jose SNUG (I can't recall the title, but I do recall
it winning the best paper award.)
1.) Using Design Compiler, for a given block, generate timing endpoint
data with:
report_timing -path end -nosplit > filename.rpt
2.) Plot that curve with slack ratio on the X axis and aggregated path
on the Y axis (Sort the slack ratios to get a smooth curve).
3.) After doing this from the initial synthesis, do it again with the
post-placement parasitics, and overlay the curves.
4.) After generating the custom WLMs, retime the design with those WLMs.
Recall that the purpose of custom WLMs is to improve the predictive
quality of the synthesis results.
Look at these three plotted curves, and make sure that the Custom WLMs are
SIGNIFICANTLY better. I have done this and they were actually WORSE than
my vendor supplied models!
I cynically believe that the awful results that can be had using custom
WLMs is a Synopsys marketing ploy. "Gee, Custom WLMs did not solve your
problem, then you must need PhysOpt!".
I just love it!
- [ BeetleJuice ]
( ESNUG 349 Item 9 ) --------------------------------------------- [4/18/00]
Subject: ( ESNUG 343 #10 ) Fun Stuff We Found Analyzing Our .lib Files
> We always had the problems mentioned in this issue (unpredictable
> synthesis results per a library change). This is the first time we got
> some kind of a guide from the Israel Synopsys support team. It's a 10
> page article in SolvNet titled "DC Library Ultra Guidelines" (October
> 1999) and it has the detailed description of the new DC timing model, how
> to analyze a .lib, and plus basic library developer guidelines for their
> new timing model. It's Synthesis-625.html on SolvNet.
>
> A good, meaty guideline of library developing is sure missing.
>
> - Doron Nisenbaum
> Chip Express (Israel) Haifa, Israel
From: Dale Walter <dalew@actel.com>
Hi John,
We performed a study at Actel last summer to try to determine what cells
should be in an optimal synthesis library. Being an FPGA manufacturer,
Actel's libraries do not have as many cells as a typical ASIC library,
since we do not have multiple cells with different drive strengths. But
I believe the results of our study are analogous to the ASIC world.
In our study we chose to focus on the synergy between the cells that
exist in a library rather than their timing properties. Our libraries
make use of the simple linear timing model, which seems to yield fairly
accurate timing estimations for Actel architectures. The impetus for
this study was to determine which TMR (triple module redundancy)
flip-flops to develop for our Hi Rel customers. Rather than simply
making TMR equivalents of all existing regular flip-flops, we wanted to
determine the minimum set that would accomplish mapping yet still yield
the best QOR (quality of results).
As Doron Nisenbaum observed, Synopsys does not give much guidance in the
theory of library development. In previous libraries we simply threw
everything into the library that we had. We wondered if this was a good
idea, however, and wanted to find a way to empirically determine whether
a library was optimum.
We hired a college student for the summer to do all the grunt work. We
began with a simple library using the minimum set of cells for CMOS
technology library, as defined in the Synopsys Library Compiler User
Guide. This consisted of a 2-input AND gate, a 2-input OR gate, a
2-input NOR gate, and inverter, a D flip-flop with preset and clear, and
a D latch with preset and clear. The only thing left out was the
internal three-state buffer because none exist in any Actel antifuse
architecture. We used Synopsys' 99.05 release both for compiling the
libraries and synthesis.
Next we performed synthesis using the simple library on a suite of 23
designs. We tried to pick a diverse cross-section of large and
realistic designs, many of which were donated by customers. We measured
run time, area, and maximum delay in order to determine QOR.
After the first set of synthesis runs were completed, we began adding
cells to the simple library. We added just a few cells at a time and
then re-ran synthesis and took measurements. First we added some AND-OR
gates, then some NOR gates, AOI gates, XOR gates, OR-AND gates, etc.
Then we began adding enable flip-flops, MUX flip-flops, enable latches,
etc. In total we created 23 separate libraries. When all the tests
were completed, we plotted all the data on several graphs and analyzed
the results. We also took a detailed look at some of the designs to see
what DC was doing when we found blips on the graphs.
The results were quite interesting and in some cases surprising. With
the simple library, as might be expected, area was highest. Run times
and delays were also quite high. Area fell off quite steeply at first
as more cells were added, and then leveled off for the remainder of the
libraries. Adding enable flip-flops yielded the lowest area. Area
began to increase slightly as more cells were subsequently added.
Delays actually increased with the first four libraries, then steeply
fell with the next three, then leveled off asymptotically, gradually
decreasing in a step-wise fashion for the remainder of the libraries. A
few spikes were observed along the way which caused us to take a deeper
look at the synthesis results. To our surprise, we found that the
addition of certain cells caused chains of inverters to be added to some
paths, thus slowing them down. It should be noted that all cells in the
Actel MUX based antifuse architectures are implemented in silicon with
either a C-module (combinatorial), an S-module (sequential), or a
combination of the two. Thus all sequential cells have roughly the same
timing characteristics, as do all combinatorial cells. So we could not
find a good reason for DC adding the inverter chains.
Run times fell off steeply at first and then rose again, leveled off for
a short while, then slowly fell to their lowest level and stayed there
for awhile, then began to slowly rise again. The low level corresponded
to the addition of the more complex flip-flops and latches.
As a result of our study, we chose for an optimum library the one where
the area, delays, and run times all were at their lowest levels, making
sure to avoid any spikes. Although our method was very time-consuming,
it did give us a lot of confidence in our choice of cells, and has
subsequently been field proven. It has also opened our eyes to some
very scary gotchas which we may never otherwise have noticed. What we
are dealing with here is an extremely complicated beast. Although it
sounds like a good idea, I sincerely doubt that it is possible to
publish a meaty guideline for library development that could possibly
take into account all of the variables and the synergy between cells,
and I would be very skeptical of any that claimed to do so.
- Dale Walter
Actel Corp. Sunnyvale, CA
( ESNUG 349 Item 10 ) -------------------------------------------- [4/18/00]
Subject: Methodology Details On The EmpowerTel MIPS PKS/Ambit RTL Tape-out
> For a detailed example, look at the recent hunk of FUD where Cadence is
> repackaging PKS and trying to say it's "new" and "improved" with their
> SP&R press announcement from last week. In it you have Jayan Ramankutty,
> the VP of engineering at EmpowerTel saying:
>
> "With PKS, we eliminated design iterations and got higher-performance
> results in a much shorter time. Adding routing to PKS and PKS to
> Silicon Ensemble will allow us to achieve correlation across tools
> and continue meeting our aggressive performance goals."
>
> Do some digging and you'll find Jayan Ramankutty was the same VP at Lara
> Technology in the Ambit press announcement a year earlier. Do some more
> digging and you find that EmpowerTEL and Lara Tech are the same company
> (they're just about to split) and they have 12 Cadence Verilog licenses,
> 5 Ambit licenses, 1 PKS license, and 1 complete Cadence Silicon Ensemble
> suit -- roughly about $1 million worth of software. Ramankutty's quote
> vaguely implies they used PKS in a big tape-out. (I'm sure that's what
> the Cadence sales force is saying to the customers and Wall Street.) It
> turns out that, yes, this is the first known PKS tape-out I've found,
> but PKS was only used on 50 kgates of MIPS core buried inside a 130 Mhz,
> 2.5 million gate design that had a lot of RAM. The role of PKS in this
> design at EmpowerTEL was trivial -- yet the Cadence press release was
> purposely written to *imply* much more. Classic FUD.
From: Anand Dharmaraj <ad@empowertel.com>
Hi John,
My name is Anand Dharmaraj. I am a design engineer at EmpowerTel Networks
and we have been using Ambit RTL & PKS tools for synthesis. My job was
synthesizing two versions of an embedded MIPs processor for usage in our
SOC. At this point we taped out our chip and are waiting for it to come
back from the fab.
There are a few things that I found that have helped us tapeout without
too much of a slip in schedules. One of them is the usage of the PKS
tool from Cadence.
Initially I was using the Ambit's RTL synthesis. We switched to PKS as
soon as it became available and started using both tools in parallel on the
MIPS processor. The results were so excellent that we dropped the Ambit
RTL synthesis runs in a matter of 1 week. PKS was extremely simple to set
up, all we needed was the source files, the scripts and constraints, and
floorplan DEF files (no wire load models needed). PKS synthesis time, from
scratch to getting a database that met frequency goals, was about 16-20 hrs.
On the other hand, just as a comparison, the Ambit RTL synthesis runs almost
always never completed, and the timing was off by 23 percent. We reached a
point of going with the timing that came out of the Ambit RTL synthesis run
because we were reaching a point of diminshing returns (area was increasing
and timing wasn't getting any better). That was when we started the PKS
runs. We did a lot of correlation work with respect to timing between the
Ambit RTL tools and Cadence Pearl, using wireload models, SDF, etc. The
numbers that came out of the PKS runs compared almost on the dot with the
Cadence Pearl timing numbers. The PKS run results in terms of violations
(i.e. slew time limits, long interconnects delays, large gate delays, etc.)
were almost none. We used a scan based architecture for our complete
design, requiring two rounds of synthesis, and we found that the design was
altered negatively after the separate runs.
I think PKS has been a extremely valuable addition to our design process.
- Anand Dharmaraj
EmpowerTel Networks San Jose, CA
( ESNUG 349 Item 11 ) -------------------------------------------- [4/18/00]
Subject: ( ESNUG 348 #6 ) Cadence's Silicon Ensemble & Cross-Cap At 0.18um
> Here's the deal: I'm working on a 0.18um ASIC with a Japanese foundry.
> Good guys. They're working their butts off and closing timing without
> a problem. Not a word that we need to worry about cross-capacitance.
> Then I get to talking to some of my friends working on processors, and
> I start to get a bad feeling. For one thing, cross-capacitance isn't
> proportional to clock speed, its proportional to metal pitch. You can
> be running 83 MHz in 0.18um and still have cross-capacitance bite you
> in the butt. Sure, maybe you can leave some margin on the table to
> cover additional delay due to cross-capacitance, but you can't margin
> noise. Get a glitch far enough down your logic cone, clock it into a
> flop, and suddenly you get to debug a frequency band where the part
> fails. Lovely.
>
> So when I press my foundry, I get this cheesey answer that they're
> going to insert buffers every couple of millimeters to prevent
> cross-talk. Sounds good, right?
From: Lou Scheffer <lou@cadence.com>
Hi, John,
Here's my take on this. (I'm from Cadence, by the way.)
This solution doesn't work well for several reasons. It's hard to apply on
nets with multiple drivers (if your methodology allows that.) Blindly
inserting buffers will screw up clock trees and other carefully designed
nets. It's hard to do on busses and still keep interbit skew constraints.
> Not once you run the simulations. For most nets, a 1-2 mm buffering
> distance would be fine, but if you have a very high drive cell
> aggressing on a very low drive cell, it can increase your delay 50% at
> less than 1/2 mm of adjacency, even on wide-pitched metal. 50%.
> That's no small potatoes. I mean, how much margin do you have?
This problem shows up in another way that affects performances even more
seriously. If you have no way to estimate crosstalk induced delay, then
you need to over-estimate coupling C to account for the Miller effect.
Suppose you decide to overestimate cross coupling C by 60% (a fairly
typical number). Then assuming half the capacitance is coupling, ALL nets
are overestimated by 30%. This is why chips still work despite the problem
above, which has been fairly serious since 0.25 micron technology.
> The best thing I can say for SE is that it runs without crashing.
>
> Here are the problems we've found:
>
> 1. The parasitics coming from HyperExtract are up to 200% off
> compared to 3D field solution. A chimpanzee throwing darts
> at a diagram of parasitics could do better.
200% on which size parasitics? On very small ones, maybe. The models (at
least in the past) have been generated so they get total C as close
as possible (generally within a few percent). Since coupling C is about
1/2 of total C, if a big coupling C was off by >100%, one or the other
of the grounded or coupling C would become negative! So we can be sure
that big coupling Cs are not off by this far.
A more relevent metric is the effect of the error on delay and crosstalk
computation. This is roughly given by:
(% error in coupling C) x (value of coupling C)
-------------------------------------------
(value of total C)
This should show (from our experience) that although HyperExtract is
certainly not perfect, the errors are manageable.
That being said, for any of the 2, 2.5, etc. extractors, the coupling
capacitances are not as accurate as the total capacitances. That's
because these tools are calibrated against 3-D field solvers, and
the users typically adjust the coefficients to get the total C as close
as possible. Historically they have not worried about whether this C
was coupling or grounded. When all the C was grounded for the purposes
of delay calculation this was OK. Now that there are tools that
can use these coupling Cs, the CAD folks who write these extractors are
trying to get the individual components to better accuracy.
Finally, make sure that you are using an extract parameter set that
has not been pre-compensated for Miller effect. Many times the parameters
are set up to deliberately overestimate coupling, typically by 50-60%.
This is done to include the effect of crosstalk on delay. Even if
you are willing to accept this solution for delay, though, it causes
serious overestimation of crosstalk.
> 2. Its flagging over 1000 noise violations in a few hundred K gates
> of logic. Over 1K noise violations in < 9mm^^2 of randomly
> routed die? Gimme a break! Simulation showed that SE was
> over-estimating noise by > 100%. It was using a slew rate
> less than half of the actual slew rate. Its hard to say how many
> real noise violations are in there, but my simulations are saying
> my usual layout topologies ought to give me < 10 noise violations
> per 100K gates *usually*. Not 1000!
It's certainly not surprising that using a slew rate twice that of the
real one would result in 1000 violations. The next question is why
were the slew rates bad?
The slew rates are derived straight from the library data. Starting
from the inputs and flip-flops, using the load C (and the input slew),
the output slew is looked up in a table. This slew is degraded in the
interconnect, and then forms the slew at the input to the next gate,
and so on.
So if the slews don't match what you see in SPICE, the most likely
reason is the library data is wrong, or that the operating conditions
differ (For example, the fastest possible slew rate is usually obtained
at with a fast process, low temperature and high voltage. If you
compare this to a SPICE run at nominal process, voltage, and
temperature, the SPICE may well be slower by a factor of 2).
Note that input slews have a small effect on delays, compared to output
loading. Therefore it's quite common that the table of output slews
(as a function of input C and input slew) is very wrong. This does
not affect delay calculation much (which is why the error is allowed to
persist) but it affects crosstalk a lot.
Historically, this problem is particularly bad with synthesis and
simulation libraries, since these treat slew poorly if at all. For
example, Synopsys for many years did no slew degradation in interconnect
(has this been fixed?) and SDF provides no way to back annotate slew.
So people developing these libraries have no incentive to get the slews
right. This works OK (or at least no worse than usual) until you get
to crosstalk or some other analysis that depends on realistic slew values.
Next, you might want to look very carefully at the thresholds you are
using for crosstalk analysis. Usually, in a modern CMOS process, there
are scrillions of nets with noise about 20% of supply, lots with 30%
noise, many fewer with >40% noise, and so on. Since the number of
nets reported is an extremely strong function of the threshold, a little
work here characterizing your cells can pay big dividends.
Finally, you might want to double check your intuition. Assuming you
fix the slew problem, and set the thresholds correctly, it would certainly
not surprise me in a design this size to have 100 nets that could be bad,
given that all the neighbors that can switch at the same time did so, and
in the same direction. Will this ever happen in operation? If the
neighbors are busses it is certainly possible. Even if they are random
logic, a modern chip can easily do 10^17 cycles (300 MHz x 10 years),
so some very unlikely combinations can happen.
Of course the application has a strong impact on whether you care about
these unlikely occurances. In a video processor, one bad pixel every
few seconds would never be noticed. In a PC, one error per year would be
totally swamped by the software error rate. If you are building a
pacemaker, though, you might want to fix every single possible crosstalk
problem, no matter how unlikely.
> Part of the trouble with all of this is the incredible lack of data in
> the industry on what to expect in real live designs. Virtually all the
> data I've come across is from contrived layout topologies on test chips
> or from microprocessors with manual or heavily programmed routing.
> That just isn't an option in ASIC design. I haven't found anybody who
> can knowledgably tell me what kind of cross-capacitance issues to expect
> in automatically routed logic. It gets down to religion. Some people say
> that because this type of routing results in large numbers of extremely
> small aggressors, the cross-capacitance can be neglected (the aggressors
> will never all aggress at the same time). Sounds good. But I sit here
> and look at the routing fanning in and out of some of my memories and
> MUXes, and there are long stretches of massive adjacency. I think the
> large-#-of-small-aggressors argument just doesn't hold up across the
> die. Even in this random routing, there are instances of great
> regularity.
Silicon Ensemble sums all the aggressors, no matter how small, for exactly
this reason. For example, you might have a signal that runs across a 256
bit bus. Though each capacitor is very small, in this case they all might
change in the same direction at the same time, and the small Cs cannot be
neglected.
Then, of course, users complain that the crosstalk analysis is reporting
too many potential errors.
> The second argument I've come across is that odds are against all these
> lines having the appropriate phase relationship to clobber each other.
> But I can't guarantee that for all signals entering and leaving memories
> and muxes. What do I look like, someone who wants to run vectors for the
> rest of my freakin' career?
>
> All in all, I feel a lot of bad silicon coming on...
Even without regularity, if you have 10^6 nets with 60 neighbors each, and
run it through 10^17 cycles, and the data is random, you can expect that on
some cycle at least one of the nets will have ALL of its neighbors change
in the same direction! Busses and regularity make the situation worse,
though by how much is very unclear.
These problems are not mysterious, nor are they unavoidable. Microprocessor
designers and advanced ASIC users have been dealing with these issues for
years. The issues can be addressed by methodology, better tools, or more
analysis and awareness by the users. If you don't address it at all,
or address it superficially, it's sure to bite you. I suspect a lot
a bad silicon will be built before this lesson is fully appreciated.
The good news (if any) is that these problems, and others such as IR drop,
electromigration, wire self-heat, and hot electron effects, are well
understood. The next generation of tools (from Cadence, at least) has been
designed from the ground up with these effects in mind. So fairly soon we
should be able to get back to the desired situation where the user types in
RTL and gets back legal layouts, where legal now means DRC correct + all
DSM problems fixed.
- Lou Scheffer
Cadence San Jose, CA
( ESNUG 349 Item 12 ) -------------------------------------------- [4/18/00]
Subject: ( ESNUG 348 #11 ) report_power Memory Leak A Power Compiler Issue
> We are using DC 99.10-04 in the Tcl mode and the report_power command for
> power characterization of DW-components for a LSI library. The power
> characterization requires several calls of the report_power command. We
> have found that report_power does not de-allocate memory which leads to a
> program abortion for larger characterization runs.
>
> I have attached a simple DC test script to demonstrate the problem.
>
> # program to demonstrate memory leak produced by report_power
>
> # create module from designware-library
>
> elaborate DW02_mult -arch csa -lib DW02 -update \
> -param "A_width = 4, B_width = 4 "
> current_design
> compile -map_effort medium
> ungroup -all -flatten
> change_names -rules vhdl -hier
>
> # calculate power
> while { 1 == 1 } {
> set_switching_activity -period 10 -toggle_rate 0.7 [all_nets]
> report_power }
>
> One should start the script and observe the memory requirement to see that
> already for this small component each report_power command requires some
> more Mbytes of additional memory.
>
> - Gerd Jochens
> OFFIS Research Institute Oldenburg, Germany
From: [ A Synopsys Power Compiler CAE ]
Hi John,
Gerd pointed out a problem with memory allocation when he repeatedly used
the report_power command in Design Compiler version 1999.10-4. The
report_power command is actually a Power Compiler command.
The problem is Power Compiler utilizes memory each time the report_power
command is used, and it does not release the memory that it uses to do the
report_power command.
This improper memory handling is somehow attributable to Gerd's
synthetic_library variable setting. It happens when the synthetic_library
variable is set to ALL of the DesignWare Foundation (DWF) libraries.
We, the Power Compiler Team at Synopsys, are investigating the problem
(reported STAR-101169), and will fix it as soon as possible. Our current
advice is to limit the synthetic_library setting to only the DesignWare
Foundation libraries (as Gerd suggests) that are needed for that design.
- [ A Synopsys Power Compiler CAE ]
---- ---- ---- ---- ---- ---- ----
From: "Joerg Landmann" <lama@de.xionics.com>
Hi, John,
Another option would be to include the "dw_foundation.sldb" which contains
all the DesignWare foundation libraries. We haven't found a problem with
this approach.
- Joerg Landmann
Xionics GmbH Germany
( ESNUG 349 Item 13 ) -------------------------------------------- [4/18/00]
Subject: ( ESNUG 348 #12 ) Why DC & PrimeTime Can't Read Their Own SDF
> I've run into a similar problem using the Synopsys SDF writers & readers
> out of DC and PrimeTime. I've basically given up using the SDF writers
> in DC/PrimeTime and mostly use Ultima/MDC. They seem to generate the
> most reasonable SDF that back annotate both to the .lib file and to my
> Verilog/VHDL models. The timing checks sections for these standard cell
> models (for both VITAL and Verilog) are generated directly from the .lib
> via a Perl script that I wrote.
>
> - Tom David
> Silogix
From: [ A Synopsys Library Compiler CAE ]
Hi John,
Synopsys initially introduced modeling solution for of non-buffered outputs
back in 1988 using output-to-ouput timing arcs. Later when SDF came about,
it didn't have support for output to output timing arcs. This led to lots
of solutions within tools for translating timing information between SDF
and .lib/Liberty for the output pins that have dependency on the loading of
another output. Last year the 3-D Delay table model was introduced to fix
problems via library modeling.
One of the advantages of using a 3-D model, apart from getting a more
accurate delay model, is that it will result in the correct SDF. The SDF
written out by DC and PrimeTime ( version 2.1 ) doesn't support timing arcs
from output to output. So you can get the correct SDF using the 3-D delay
modeling. This was introduced in the 99.10 release of Library Compiler.
For cells where you have non-buffered outputs ( e.g. Q and Qbar in a flop)
you will use 3-D modeling. In case of dual unbuffered outputs of a cell,
the delay of the timing arc not only depends on its output capacitance load
but also on the load of another. In order to use the 3-D delay table, one
of the variables variable_1, variable_2 or variable_3 must be the variable
for related output pin loading. The legal combinations for the values of
remaining two variables are exactly what they would be if the variables
were to form a 2-D delay table. The same variables used to specify related
output pin capacitance for constraint tables are used for 3D delay tables
as well.
- [ A Synopsys Library Compiler CAE ]
============================================================================
Trying to figure out a Synopsys bug? Want to hear how 11,086 other users
dealt with it? Then join the E-Mail Synopsys Users Group (ESNUG)!
!!! "It's not a BUG, jcooley@world.std.com
/o o\ / it's a FEATURE!" (508) 429-4357
( > )
\ - / - John Cooley, EDA & ASIC Design Consultant in Synopsys,
_] [_ Verilog, VHDL and numerous Design Methodologies.
Holliston Poor Farm, P.O. Box 6222, Holliston, MA 01746-6222
Legal Disclaimer: "As always, anything said here is only opinion."
|
|