( SNUG 03 Item 17 ) ---------------------------------------------- [05/14/03]
Subject: PhysOpt, DC, PhysOpt-MPC, Magma, Monterey
NOT SWAPPING, BUT ADDING ON INSTEAD: One of the weird things I've found in
this survey is that practically all of the PhysOpt users are still also DC
users. That is, users are not swapping out DC licenses for PhysOpt licenses
as Synopsys marketing said they would; instead it's quite common to use both
DC *and* PhysOpt differently on different blocks in the same chip. This is
because a PhysOpt RTL-to-placed-gates run chows *serious* runtime (as in 5
day runs sometimes) while a DC-to-gates-then-to-placed-gates-in-PhysOpt is
many times the quick & dirty technique that gets the easier blocks out the
door in hours instead. To put demographics on this, 55% of PhysOpt users
do a DC-to-gates-to-placed-gates, 23% do pure RTL-to-placed-gates type of
PhysOpt runs, and the remaining 22% do a mix of both styles. In terms of
licenses, this means a minimum of 77% of PhysOpt users are also required to
be DC users, too! I don't care what Synopsys marketing may tell you about
physical synthesis; it looks like DC ain't going to be going away any time
soon even if the entire EDA buying public goes 100% PhysOpt.
Dataquest FY 2001 ASIC Physical Synthesis Market (in $ Millions)
Synopsys/Avanti ##################### $41.2 (41%)
Magma ################ $31.1 (31%)
Cadence ############ $24.1 (24%)
Monterey ## $4.0 (4%)
"We use the PhysOpt gates-to-placed-gates options. The biggest reason
is speed. With wireloads and DC, we get a good beginning and we have
no problems with licenses. We have lots of DC licenses so I can run
jobs in parallel. Unfortunately, since PhysOpt is expensive, there
aren't as many licenses and this would require me to run job serially.
During the design phase there is just too many things to do to run
things in series."
- Mark Weaver of Texas Instruments
"The best
- I could verify that despite everybody claims to close timing with
PhysOpt, nobody has ever close timing 100% with PhysOpt.
- The Synopsys presenter for CTS in PhysOpt-Expert. Finally a
knowledgeable person who did answer all the tough questions.
The worst
- Uncertainty continues on what it's going to happen to Synopsys'
backend flow. Not clear they even know themselves.
- A Synopsys consultant creating 22 placement regions in a mostly
square floorplan to guide PhysOpt out of congestion. Wow.
- The PhysOpt tutorials had poor experiments in them. They reminded
me of the famous "cockroach experiments": A scientist placed a
bunch of cockroaches on the table. Hit the table with his flat
hand, and observed all cockroaches running away. He removed all
the legs of every cockroach, placed them again on the table and hit
the table with his hand. No cockroach moved. Conclusion: When
cockroaches are deprived of their legs they become "deaf"!
- New PhysOpt 2003.03 doesn't have scan-chain un-stitch / re-stitch
commands. It doesn't seem to have improved in memory management
either.
We use gates-to-placed-gates. Even for small chips, PhysOpt is slow
doing RTL-to-placed-gates. We can synthesize a small chip in about 1.5
hours with 2 DC licenses. It takes much longer if we feed the design
to PhysOpt. We don't even consider feeding a million-gates design to
PhysOpt. I tried once to do a top-level optimization, but I decided to
abort it after 2 hours in the 1st phase.
Memory usage seems to be a bit of a problem with PhysOpt. Trying to do
many things usually results in very slow performance, or
"out-of-memory" problems.
Methodology is important. A good front-end synthesis methodology
(with DC), renders excellent results in gates-to-placed-gates. Once
designers understand the specific requirements of Physical Design,
they can generate netlists of excellent quality, that run very fast
through PhysOpt meeting post-placement timing. "Never trust a tool
with a man's job". This is especially important for meeting schedule.
If you trust PhysOpt to meet timing in RTL-to-placed-gates you may
find, very late in the design flow, that PhysOpt will not be able to
fix those architectural problems that nobody thought about because
they were expecting PhysOpt to do the job..."
- Santiago Fernandez-Gomez of Pixim, Inc.
"We run PhysOpt only gates to placed gates."
- [ An Anon Engineer ]
"We have been a PhysOpt user for about 3 years with several tape-outs
(0.35um, 0.25um & 0.18um). We've always used a gates-to-placed-gates
flow due to a combination of poor performance in PhysOpt when trying
RTL-to-placed-gates and Synopsys' inept DFT solutions. Recently we
have been focused on moving to Astro from SE as our mainstream router
so we haven't tried an RTL-to-placed-gates flow for the last 8 months
(it is something we will look at over the next 6 months), however, I
am truely skeptical about the quality of the results."
- [ An Anon Engineer ]
"I'm in the buffet camp, rather than the a la carte one. At Corrent,
we found that on some designs a zero-wire-load DC synthesis followed by
a gates-to-placed-gates (g2pg) PhysOpt run produced better results.
However, we found a few cases (notably the largest blocks) where
PhysOpt's RTL-to-placed-gates (r2pg) produced a faster turnaround time.
Also, contrary to Synopsys' claims, I've come across a couple of
cases where (for 400 K+ instance designs) DC was not able to run to
completion. There definitely seems to be some capacity issue. For
those of our designs that have macros, we use an r2pg run to arrive at
macro placement suggestions. Subsequently we clean up the macro
placements by hand and restart an r2pg run. Except, of course, if you
have a *large* number of macros. Then, you run the risk of the PhysOpt
macro placer timing out! I may have a paper on this at the Boston SNUG
this year. For physical feasibility (size/utilization/congestion), we
found the r2pg flow to work better. For timing feasibility, we found
nothing really beats a g2pg flow!
Bottom line, the best method is the old, tried-and-tested method of
experimenting until something works!
Surely you were not expecting a push-button answer. :)"
- Neel Das of Corrent Corp.
"A year ago we taped out a 25 M transistor design (0.25 um) with some
15 blocks at top level, all but one went through PhysOpt. About half
were synthesized with an RTL-to-placed-gates script and the other
half with a gates-to-placed-gates script.
My original plan was to run all blocks through RTL-to-placed-gates
but eventually it wasn't always the best choice.
One block was too large to be synthesized in RTL-to-placed-gates mode
in a reasonable amount of time (it took 5 days on a SunBlade 1000).
Note that the PhysOpt block size issue had more to do with the number
of designs in the block hierarchy once uniquified than with the raw
cell count (about 210K instances; a 160K instances block went through
RTL-to-placed-gates in 36 hours).
Other blocks that didn't make it successfully with the RTL-to-pg script
had high utilization (and hence high congestion) issues. The RTL-to-pg
placed netlist was just unroutable and we had to iterate over
RTL-to-pg (using custom WLM extracted from the PhysOpt run, so that
DC would have "better" estimates and could do a better job of area
recovery -- so as to lower the utilization ratio in the PhysOpt step)
and gates-to-placed-gates in high-congestion mode, which eventually
yielded a routable design.
It is worth noting though, that if the RTL-to-placed-gates flow had an
issue with congested designs, it always yielded better timing results
(not by a large margin, mind you: but it was systematically better),
which is why the blocks that went through RTL-to-placed-gates without
hitch eventually did make it to the tapeout."
- [ An Anon Engineer ]
"Session 3 - User presentations
13:30 - 14:15 - Fixing Hold violations in DSM Era
Describing several experiments from 0.18 um down to 90 nm on hold-time
fixing, and hold-time analysis. Interesting graphs. He leaves many
issues unexplained. It seems that most of their flow is script-based
(as opposed to tool-based).
Q. 11 iterations to close hold. What is an iteration?
A. Manual iterations (ECO)
Q. Did they use RC corners?
A. Yes
Q. Have you checked area vs using a 150 ps margin for hold-time?
A. No answer.
Q. Clock skew number: min or max corner? Clock skew for modules with
different number of flops?
Q. OCV. Breaks setup (multicyles now...).
A. It's not clear what they did here. It seems to me that they just
tweak everything manually
Q. What do you use to fix hold-time? Not PhysOpt...
A. Manually
14:45 - 15:00 - RTL to Layout Implementation of OMAP Platform
Full Physical Design methodology for an embedded core in a big ASIC.
All steps described. Basically DC, PhysOpt, Astro, then backannotation
to PhysOpt. Very conservative synthesis, in two passes. 1st pass
synthesis. 2nd pass flatten netlist, tighter constraints.
Q. PhysOpt didn't close timing. How did you close timing? Did you
ever close timing with PhysOpt in any chip?
A. Manual Timing ECOs. They claim it's not manual because they have
scripts (ha!) The presenter has taped-out 9 large chips as a
Synopsys consultant using PhysOpt. He never fully closed timing
with PhysOpt. Timing ECOs were always required.
Q. They had to create 22 regions to guide placement. That's looks like
a lot of work. How many iterations until they got a clean
placement? How many days?
A. They claim this is part of the "pipe cleaning" part of the flow.
So it takes time, but they don't count it as real time.
Q. IO Buffer manually? What's the tool for then?
A. They claim this is not manually, but done by DC.
Q. Develop methodology for clock tree? The tool doesn't work?
A. No answer.
15:00 - 15:45 - Logic and Physical Synthesis Methodology for a
VLIW/SIMD DSP Core
History of the development of their DSP core, from specs to physical
design scripts. They provide scripts to customers. The spec DSP
processor has to be able to be implemented in silicon, so they had to
involved the PD group from the beginning, to make sure that any thing
the compiler spits out is going to be feasible in silicon.
Q. 2-3 days turnaround time. PhysOpt 12-16 hours. How big is the
design?
A. 200 K instances (definitely they have some problem with their
scripts... our chip is 250K and turnaround time is 16 hours)
Q. Did you run routing? Any feedback from silicon?
A. Yes they did. And CTS too.
Q. How do you solve formal verification between spec and RTL?
A. No solution."
- Santiago Fernandez-Gomez of Pixim, Inc.
"PhysOpt is good, but Magma tools are definitely catching up quickly.
They provide a lot of capabilities that are very desirable and PhysOpt
doesn't have them yet. Being able to work with one tool and one
Volcano database to resolve everything is extremely desirable to have."
- [ An Anon Engineer ]
"I've used RTL-to-placed-gates, but for no particular reason. I'm quite
new to the physical synthesis world, since I've only recently moved
back on to design work after spending a few years in chip-level TA.
I've only been using PhysOpt (mainly in MPC mode) to provide more
accurate timing results, since the real layout of my block is being
done by another team in Astro.
I've found PhysOpt to be a bit unstable (lots of the infamous Synopsys
fatal errors), and I gave up trying to work out the cause of a problem
which stopped the initial placement from working properly; I just went
back to good old DC."
- David Smith of STMicroelectronics
"We do PhysOpt in gates-to-placed-gates mode only. Our Synopsys AE
told us not to attempt RTL-to-placed-gates. Too slow. He hasn't
lied to us so far."
- [ An Anon Engineer ]
"We use PhysOpt RTL-to-placed gates because of an analysis I did a few
years ago on some of our blocks. It showed that the PhysOpt runtimes
were worse than going through DC then PhysOpt gate-to-gate, but I could
get better results in some cases. I have not gone back and redone the
experiments recently.
From what I remember the results were very RLM dependent so others may
see different results. I do remember that results at the time were at
least as good on all the RLMs."
- Chris Gorzek of Cray
"I've done extensive studies into this. In almost ALL cases going
RTL (in DC) -> gates -> placed-gates (PhysOpt)
gives better results. Far better in many cases. This is counter-
intuitive. Synopsys originally told me to use the RTL -> placed gates
flow for PhysOpt. Soon after, as I kept getting poor results, they
changed their recommendations. They have observed from MANY customers
that RTL (DC) -> gates -> placed gates (PhysOpt) almost always gives the
best results. I've heard this from many Synopsys FAEs.
I have found some cases where the latest 2003.03 with the Presto reader
out performs the RTL (DC) -> gates -> placed-gates (PhysOpt) placed
gates flow. When this works, it does just a percent or two better than
the RTL (DC) -> gates -> placed gates (PhysOpt)."
- [ An Anon Engineer ]
"My last chip we used gates-to-placed-gates and it worked relatively
painlessly. Much easier timing closure compared to just DC days.
However, we didn't stress the density at all. The next release coming
up we are going to try RTL-to-placed-gates. This makes sense because
we are going to pack more gates in the exact same bounding box. I use
a team in India to do most of the backend. I think the reason we
haven't used RTL-to-place-gates before is because they seem to be very
conservative on adoption of new tools. If it closes timing I guess I
don't care. In a couple months I can tell you how the story ends for
this release. However, I assume eventually we will be going PhysOpt
from RTL-to-placed-gates always.
- Layne Lisenbee of Texas Instruments
"I've had really good success w/ SoC Encounter. Synopsys tried PhysOpt
on our exact same design, and the results took way longer, and were
worse. We subsequently taped out 3 more designs with SoC Encounter."
- [ An Anon Engineer ]
"I have been using PhysOpt for about two years now. It seems to do a
very good job for everything I have thrown at it. I have always
preferred the RTL-to-placed-gates. My reason is that I believe if you
give the synthesis a solution, gate level structure, it will use that
and solve the timing but it will not drastically change the structure.
If it starts from RTL, I believe it has the opportunity to come up with
a better solution. I see no reason to do a dc_shell compile and then
move to PhysOpt, it is a waist of time. I do like the MPC, but it
still has some problems when it comes to insert_dft, if new ports
are created.
The one thing I have always found strange with PhysOpt is that the
check_legality command does not view unplaced gates as illegally
placed. For some lame reason they think that because the cell is
not placed that it can't be illegally placed. I can almost see their
reason but I still believe that it should report the cells that are
not placed so that you could issue the commands to fix the problem
without have to write tcl scripts to solve what they already know.
I believe that the problem is Synopsys has too many software people and
not enough designers to know what is important and what is not for
automation of a design flow."
- Paul Fletcher of Motorola
"We use both. The Synopsys AE's often tell us that RTL2PG gives better
results than PhysOpt because it starts with the RTL, and that the
resource allocation step is able to use better estimates of wire delays
this way. We have observed some exceptions where PhysOpt gives better
results. Although we don't know the cause, our hunch is that the wire
load models were pessimistic for the netlist that we ran PhysOpt on.
For blocks with large memories, the wireloads tend to be pessimistic
(due to the area of the memory blocks) when DC-only synthesis is run.
This probably makes DC choose a faster implementation to meet timing
and when the same block is run through Physopt, we get better results
than RTL2PG.
PhysOpt RTL2PG is really limited in capacity, and we can't run it at
the full-chip level. PhysOpt, on the other hand, is able to handle
full chip placement and is the backbone of our company's backend flow.
One more thing, we have almost stopped using DC-only for synthesis.
Most of our blocks use compile_physical -mpc (RTL2PG). We have found
that this method gives us better estimates for timing when a netlist is
taken into the backend flow. It is good to see many timing violations
upfront after the synthesis process rather than having the backend
engineers discover these much later in the flow. We also use some
limited floorplanning commands available in PhysOpt to set the macro
orientations, locations, etc. before running MPC."
- [ An Anon Engineer ]
"We use gates to placed gates most time. It gives consistent good
result in general. RTL2Gates show good result on some design but not
all the time. Another factor trigger this is that our project team is
structure the way that front end does synthesis and hand netlist to
backend for placement, not too much effort on RTL2gates front."
- [ An Anon Engineer ]
"I've released 5 designs with PhysOpt, all gates-to-placed-gates. The
first 4 were because the customer provided the gate-level netlist.
The last one, though we were given the RTL, was gates-to-placed-gates
because we needed to utilize ACS for schedule and perfromance reasons.
If we did a traditional top-down synthesis it would have taken 24-36
hours (depending on the Linux platform). ACS reduced the job down to
8 hours on a dual-CPU machine, and achieved timing closure on a block
that had issues when compiled with a traditional top-down synthesis
methodology.
I'm anxious to try out RTL-to-placed-gates when ACS works with PhysOpt."
- Mark Johnson of Agilent Technologies
"Our use of PhysOpt is mostly centered on the gates-to-placed gates."
- [ An Anon Engineer ]
"PhysOpt is clearly a gates-to-gates engine. PhysOpt provides its best
results with a gate netlist from DC or DC-Ultra. This can be confirmed
with a quick run on any DSP module (ARM, MIPS, ZSP)."
- Benjamin Mbouombouo of LSI Logic
"I use RTL-to-placed-gates PhysOpt whenever I can. I have used
gates-to-placed-gates when absolutely necessary. But, the results
I have seen between the two give better congestion, and area with
the RTL-to-placed-gates method. The drawback I have seen is with
RTL-to-placed-gates has a definite capacity limit, and a minimally
worse timing result.
I had a design of 90 K instances that would not meet timing or
synthesize in a "reasonable" amount of time with RTL-to-placed-gates
in 2001.08-SP2. I did have other designs up to 75 K instances that
did just fine with the RTL-to-placed-gates. When I did use the
gates-to-placed-gates, Synopsys did not appear to remove all the
unnecessary logic it created when synthesized using DC with
the wireload models and over-constrained. I ended up having to do a
congestion-based placement with create_placement then performing
PhysOpt incrementals for timing in order to keep the congestion
routable using the gates-to-place-gates and the DC netlist! The
design was about 75% utilized with a lot of macro's. I have not
done the comparison with any of the 2002 versions. I just stuck with
RTL-to-placed-gates for our most recent chip."
- [ An Anon Engineer ]
"For PhysOpt, only gate-to-placed-gated used in my company. IMHO,
RTL-to-placedGate is a concept only by now. We have tried some
designs to find the total quality of layout is not as good as we
expect. Not to mention there are still scan insertion, ECO issues,
and runtime issues."
- [ An Anon Engineer ]
"2:30 - 3:15 - Accuracy of Timing Predictability using Minimal Physical
Constraints (PhysOpt-MPC)
Running PhysOpt with very few constraints to get feeling of timing
after routing. They don't even give a floorplan to PhysOpt. They
compare a reference design (PhysOpt + floorplan + powerplan) with the
PhysOpt-MPC run. Timing comes similar. Floorplan very different.
Runtime is shorter for MPC, but not much shorter (10%).
Q. Did you route the design? Is it routable?
A. No, it's not routable.
3:15 - 16:00 - Re-Shape's Astro-Based High Performance SoC Design
Environment
Reshape's marketing presentation. :)
They despise estimation technologies. They are pushing their flow.
It's amazing how good their flow can be in a set of slides. Of course,
he didn't say anything about the time it takes to do simple tasks,
like ECOs, moving a PAD 1 micron, or changing the floorplan...
Wednesday March 19
Session 1 - 9:00 - 12:15 - PhysOpt Highlights in 2003.03
List of bug fixes disguised as "new features"... No much to say here.
The presenter went through a long list of changes, and an even longer
list of new variables to tweak/control the behavior of PhysOpt. See
the proceedings.
Q. Scan-chain removal and re-stitching
A. Not yet
Q. Memory management in PhysOpt. Max design size?
A. No answer. They say they have improved, but it's not clear
Q. Buffer tree creation. Max size? runtime?
A. No answer.
Q. 20% speed-up. How much is because of better WNS?
A. No answer.
Q. remove_high_fanout_nets none/medium/high is not consistent
notation. It's confusing.
A. No answer.
Very disappointing first part of the session. There were plenty of
interesting questions, but Synopsys was not able to answer most of
them.
Second part of the session, CTS in PhysOpt Expert. Much better. Very
knowledgeable presenter. Good overview, and good answers to questions.
Basically CTS does everything CT-Gen does, but within PhysOpt. It also
have a few more features. Unfortunately PhysOpt fails once more in
efficiency and runtime. With the numbers Synopsys provide, I estimate
that CT-Gen is at least 4 times faster. One person in the audience
actually points out that they don't use PhysOpt because Astro's CTS
engine is much faster.
Q&A. Way too many questions in this session, from me and from the
audience."
- Santiago Fernandez-Gomez of Pixim, Inc.
"We did two designs with PhysOpt last year, both RTL-to-PG. On first
go-around, tried G2PG flow for comparison purposes. Also to work
around the very amateurish tool bugs, like non-functioning VHDL read-in
in early 64-bit versions of PhysOpt, intermediate file inconsistency
w/ the supposedly new and improved Presto Verilog preprocessor, etc.
Noticed no real difference in performance or otherwise, then again,
I suspect our designs weren't exceeding the 350 K instance limit
anyway. Biggest headaches were with interoperability and PDEF file
exchange between Cadence and PhysOpt. Not much of a surprise there."
- [ An Anon Engineer ]
"Our small design group has used PhysOpt RTL-to-placed-gates for some
time now. We found that it could meet timing and solve route
congestion on some designs that were just impossible when starting with
a dc_shell-generated netlist, even with a variety of wireload model
approches. I think it's because non-placement aware synthesis makes
too many bad buffering decisions for the Astro optimization routines
to overcome. We found that the post-layout timing was very close to
what PhysOpt predicted, and that timing closure could then be completed
inside Astro without too much trouble. Since the PhysOpt timing is
reliable, we can try different floorplan approaches and judge them by
their PhysOpt congestion map and timing reports. This is considerably
quicker than place and route iterations."
- [ An Anon Engineer ]
"The gates to placed-gates flow worked out very well in our past design.
We did not use Physopt at the top-level, but PhysOpt seem to do a very
good job with regard to congestion and transition fixes at the module
level. We had no need to fix transition after back-annotation in
PrimeTime. More specifically there was a bug in insert_dft -physical
which tied the output pin of a register to si pins of two registers
parallely."
- Dhivakaran Santhanam of Analog Devices
"We actually use both commands.
1) compile_physical directly from RTL
and
2) PhysOpt for gates-to-gates after compile.
It depends on the block features. Timing/placement/run time differ on
the blocks depending on the flow. We started out with PhysOpt for all
blocks on the chip, but compile_physical got better results on some of
the blocks eventually. Also flattening or not flattening and at what
stage to flatten make a big difference on the results. I would say 50%
of the blocks use PhysOpt, 50% use compile_physical.
The most important determining factor on what flow to use is the timing
results. It is good to try both flows and pick the one that gives the
best timing (obviously, if it does not break your goals severely.)"
- Gun Unsal of Intel
"Here's my top 3 wishes for PhysOpt / PhysOpt+Astro:
1) better cap and RC reporting. This would help both users to find
outliers, to compare estimators vs. later reality, and to verify
libraries. I want a function (TCL or builtin) that puts out
one long line per net (for sort / grep), with the fields:
fanout
total cap, routing cap, couple cap
net size (bounding box will do)
nr vias (main RC component in lv130c with 20-ohm vias)
max RC delay, max transition, max to-pin
driver celltype
driver name
net
sorted, and filtered with field x > mylimit. (In Astro we have
Scheme functions muBigNetInfo / muIONetInfo for {fanout size
driver net}, and that's useful, but we don't have caps nor RC
in Scheme.) Such cap-RC reports can then be plotted and diffed
through PhysOpt -- Astro phase xx -- SPEF.
For extra credit, do this for 'nets-with-buffer-trees', too.
2) separate estimators for Xcap and Gndcap, with multipliers;
Xcap is noisy and hard to predict, so I want to be able to say
Xcapestimator *= 1.5 both to reduce Xcap early and to compare
predicted vs later phases. (For that matter all the various cap
and res multipliers need to be cleaned up and verified; changing
the techfile is not clean.)
3) (longterm): a glass top into the optimization engine:
engine <--> data stream <--> text viewer
<--> graphic viewer.
To improve any complex multi-objective box, you have to *see* into it
to understand and tune it. Consider the data streaming between a
Ferrari and the pit: lots of data, with lots of people looking at it in
different ways. Imagine Ferrari engineers scrolling through megaline
text files."
- [ An Anon Engineer ]
"I both use compile_physical (from RTL to placed gates) and PhysOpt
(from gates to placed gates).
They sometimes (depending of design architecture and constraints)
achieve quite different QoRs in term of run time, area, speed etc.
Thus I normally run both flows and pick up the best.
However the gates-to-placed gates flow, is the general preferred way
when MPC prototyping since one can refine the physical constraints
(but possibly also timings, area and design rules constraints) and
start a PhysOpt session, without resinthesizing the whole design
starting from the RTL.
Indeed when the design contains a lot of datapath operators or very
big random logic, the run time saved can be as much as several hours
for each iteration. Of course the PhysOpt gate-to-placed gates flow
requires a gate-netlist, that can be obtained through an RTL-to-gates
logic synthesis (i.e. compile). This latter logic synthesis may
employ different kind of wire load models (standard, custom, not at
all) depending on design, process, constraints, QoR expected and run
time. But this is another story."
- Marcello Vena of Xignal Technologies
"We use PhysOpt RTL-to-gates. It closely correlates pre-routed timing
with post-routed timing. However, during crunch times we found that
there are exceptions which gives us headaches as we have to manually
upsize certain gates to improve timing."
- David Fong of S3 Graphics
"In parallel we also partially evaluated Monterey's tools. Monterey
took our design and were close to closing it in approx 4 weeks with
30 emails/calls taking place. Monterey also shrank our die size by
approx 10%. We were very impressed by this. Unfortunately we did not
have time to drive the tools ourselves before starting our next
project, but we learned enough to see some conceptual benefits in the
tool suite.
To be fair most of the Monterey area saving was due to changing our
conservative power grid (we did not have a power tool at the time; we
were able to do the same with Apollo when we evaluated AstroRail).
Monterey tools have a Tcl based command line interpreter, allowing much
quicker scripting than Scheme. In addition, their STA commands are the
same as PrimeTime and the structural querying commands are the same as
Design Compiler. You get a familiar Tcl interface for querying the
Monterey database including the timing results. You can write scripts
to query their database and act upon the results. With the Apollo
and Astro flow we had to write scripts parsing the timing reports and
netlists and generating ECOs which obviously adds significantly to the
turnaround time.
The Monterey Tcl interface also allows you to write nice DRC (zero
fanin/fanout etc) scripts to check your netlist within their single
framework. The tools run in memory like Astro and you get all the same
run time benefits from this. The extraction tool within Monterey's
tools suite does a full extraction approx 3 minutes (compared to 60
minutes with Star-RCXT.)
The only problems we found with Monterey were the speed of the database
load and the speed and memory efficiency of the GUI. Both of these
issues are being addressed in the next release in June. One of the
side effects of working with the Monterey rep was that he also solved
some of our outstanding Apollo issues, too. Cruel irony.
- Craig Farnsworth of Cogency Semiconductors
|
|