( ESNUG 338 Item 1 ) --------------------------------------------- [12/3/99]
From: Jon Stahl <jstahl@avici.com>
Subject: A Customer Trys The New Chip Architect Tool On 3 LSI Logic ASICs
Hi John,
I was very interested to read the Flexroute and lately PhysOpt reviews. A
few months ago, I and others at my company didn't really have a lot of
interest in physical design tools from Synopsys. However after a decision
to move to third party layout tools, we decided to explore all the options
out there -- not just the standard Cadence and Avanti offerings. And after
an eval of Chip Architect, we have been pleasantly surprised and have become
early adopters. I thought there might be interest in hearing a user review.
All of our designs use LSI Logic as the foundry.
We decided to look at Chip Architect because of certain key features. Like
a lot of other folks these days, we also believe that hierarchical layoutis
the way to go. Chip Architect promised a natural way to do this, starting
not only at the gate level, but the ability to do planning at the black box
and RTL level, too. The tight integration between placement and timing that
Chip Architect promised really got our attention.
The Chip Architect Design Flow
------------------------------
Before I critique Chip Architect itself, I felt it would be best to give a
thumbnail schetch of how its flow works. It's a challenging task to comeup
with recipe book style steps for the Chip Architect flow, but here is my
attempt at a general flow. I think this is OK as a first level guideline.
1. Create Quick Timing Models (QTMs) for any soft black box blocks
(for which RTL is not available), if any, in PrimeTime. We designed
in Verilog and used VCS for simulation. We used an old copy of
VeriLint and VERA to verify the Verilog RTL.
2. Read hierarchical structural netlists into Chip Architect. Could bea
combination of black box blocks, hard-macros, RTL blocks, Gate level
blocks. For memories, we used LSImemory to generate synlibs. We
read the synlibs into DC, did an update_lib, and wrote out mem.db's
for each mem macro we used. Imported them into Chip Architect.
3. Read top-level constraints (say top_const.tcl), timing models of
IP/hard-macros, and QTMs in Chip Architect.
4. Floorplan Hierarchically in Chip Architect:
a) Size black boxes. Here is a TCL script to size a black box,
using approximate gate count:
# Size a selected black box, based on gate count.
# Assuming 30u sq area per gate
proc Size_BB {size} {
reshape_object -def "0 0 [expr sqrt($size*30)] [expr sqrt($size*30)]"
[get_sel]
}
b) Manipulate physical hierarchy (flatten, merge, etc.). Hierarchy
browser window in Chip Architect is great tool for this.
c) Perform Automatic block placement in Chip Architect
d) Do Power Bus planning, Create blockages, Pin assignment. We did
all of our power designing in LSI's layout editor because it was
a 4 metal layer design. Usually you only worry about power
conflicts if your power layer is on the same layer as your cell
interconnect layer. I kept our power stuff in metal 3 & metal 4.
e) Coarse Routing in Chip Architect
f) Perform Std cell placements within blocks, and coarse route
5. Analyze for Timing and Congestion in Chip Architect. Use Chip
Architect's built-in PrimeTime engine for Timing Analysis. Use
congestion map utility within Chip Architect for congestion.
6. Tweak as necessary. Some alternatives (depending on the violations)
are:
a) Refine floorplan in Chip Architect (resize, move blocks, add
blockages and so on)
b) Re-run pin assignment, placement, coarse routing with higher effort
options.
c) Perform top-level route in FlexRoute.
d) Perform In-Place Optimization (gate sizing, buffer insertion) in
Chip Architect. (I couldn't get this to work!)
7. Output custom wire load models (say Chip.cwl), loads (say
Chip_setload.tcl and Chip_setresistance.tcl), and interconnect SDF
(say Chip.sdf) to do budgeting to generate accurate synthesis budgets.
8. Perform Budgeting in PrimeTime -- Create synthesis constraints. Here
is a sample script to do budgeting in PrimeTime:
# Read netlist
read_verilog Chip_est.v
current_design Chip
# Read SDF
read_sdf ../parasitics/Chip.sdf
# Apply constraints
source Chip_setload.tcl
source Chip_setresistance.tcl
source top_const.tcl
# Allocate budget for each of the top level blocks.
allocate_budgets -level 0 -write_context -no_boundary -format dcsh
9. Run Design Compiler on the soft blocks using constraints generated from
budgeting (step 8), and Customer wire load model generated from Chip
Architect, i.e. Chip.cwl (step 7). (My understanding is that for
better results, you could use PhysOpt at this stage in place of DC.)
10. Read the DC generated netlist back into Chip Architect.
11. Perform final placement on all blocks (high effort), refine Floorplan,
do top level route. Run clock tree synthesis (currently under dev in
Chip Architect). We're looking at integrating Ultima's ClockWise tool
here because they have a useful skew solution. The Chip Architect
people are doing a zero skew tool; ClockWise can skew your design to
get additional set-up time.
12. Final Analysis and Optimization (similar to step 5). The only gotcha
here is that I had to write my own TCL repeater insertion tool because
I couldn't Chip Architect's IPO features to work.
13. Perform final In-Place Optimization (IPO) & other fixes for violations
(like step 6). (Uh... This was the official way it was supposed to
work. It didn't. I'm just including this step to be complete.)
14. Output final floorplan, final cell placement, final netlist to a std
cell router -- in our case, this LSI's FlexStream Global, Detail, and
Cleanup tools (version 1.0). (This isn't new LSI software, it's just
been renamed to FlexStream. Before this, it was LSI PD, before that
it was CMDE. This is tried & true LSI software.)
15. Bring back the routed blocks into Chip Architect, make sure overall
chip timing is okay. We never did this step, because we used LSI's
delay predictor, "LSIdelay", to generate SDF's and then we used
PrimeTime to verify the final timing. Actually, we used Frequency
Tech's Columbus to extract the parasitics and fed that into LSIdelay
to make the SDF's.
Our major concerns with a new tool of this complexity and from a vendor whom
until recently didn't play in the P&R field were typical:
- Was the code stable ?
- Would it have the capacity to handle million gate plus designs ?
- Would the placement quality of results measure up ?
- Would the runtime measure up (multi-threaded ?) ?
Plus the additional concern of whether the timing Chip Architect predicted
would match up, within reasonable amount, with our vendor's (LSI Logic)
sign-off delay calculator.
Our Experience With 3 LSI Designs
---------------------------------
We did various amounts of testing/actual work w/ the tool on three designs,
"Larry", "Moe", and "Curly":
1.) "Larry" - 100K gates, 3 SRAMs, 100MHz (used to test our proposed flow)
Since Chip Architect does not perform detail routing, but stops at
placement and global routing, we had a problem. To use it on our
current 0.25um and larger geometry designs we would have to interface
to the LSI proprietary tools for clock insertion, routing, etc. (LSI
now uses Avanti for the 0.18um and below geometries). Chip Architect
was designed to interface to Cadence/Avanti, but had no hooks for LSI.
Using Chip Architect's TCL API interface, we were able to accomplish
the two way handoff without much difficulty. We made a Perl scriptto
map LSI's pad placement into TCL commands to re-create it in Chip
Architect. And a TCL script was used to write out the cell
coordinates and orientations of all internal cells in LSI format.
On this design, due to the small size, we decided not to use the
hierarchical features and just place the it flat. This design did
not have difficult core timing, but had extremely tight I/O timing.
As a way to meet both setup and hold output constraints this design
had large delay cells instantiated in the output paths which we would
later ECO downsize as necessary.
Performing a timing driven placement was simple as we could pretty
much use the PrimeTime constraints already prepared for timing
analysis. On this design placement took 1.2hrs. (4 processors),
global routing 45 min., and timing calculation and analysis another
30 min. Our results were mostly good, as core timing was completely
met with +1.3ns of slack, but the I/O timing was off (as expected)
with -3.3ns of slack.
We then attempted to use the automatic IPO features of Chip Architect
to fix the timing problems -- with little luck. The promised Chip
Architect IPO features were cell upsizing and buffer insertion, but
the recommended flow for "best" results was to export Chip Architect
info to Synopsys Floorplan Manager (... more on this later).
Anyway, since the necessary corrections were obvious, it was easy
to use the (excellent) interactive Chip Architect TCL commands to
downsize the cells, legalize the placement, and re-time the design
(*incremental* in most cases, and very fast) ... and timing was met
in Chip Architect.
We then dumped the placement into LSI's tools, re-performed global
routing and estimated parasitic extraction, and generated SDF's using
LSI's delay calculator. After re-timing the design with PrimeTime,
the correlation was within ~5% on *most* paths. (The only real
exception to the outstanding correlation was output buffer timing.
For some reason, which at this point we haven't really researched or
explained, LSI's delay predictor shows ~900 ps slower paths than Chip
Architect. This anomaly was put on the back burner due to time
pressures and the simple work around of compensating with additional
output delay.)
2.) "Moe" - 1M gates, 100 SRAMs @ 100MHz
This was a design in progress. We were having trouble getting timing
closure. After well over a month of timing iterations -- which forus
consists of synthesis, test insertion, placement, scan reordering,
MOTIVE analysis, repeater insertion and IPO upsizing, and re-analysis
with motive -- we still had ~3K paths with as much as -2 ns of slack.
Since the design was being done flat with just careful floorplanning,
and it would have been way too much work to go back and re-implement
the design hierarchically, it was a what the hell let's try it kind
of thing to throw Chip Architect totally flat (no floorplan) at it.
It took a little bit of work to port the ram placement, power routes,
and placeblocks from the LSI database and into Chip Architect, but
after that things ran smoothly. Chip Architect completed full timing
driven placement of ~300K instances in 6 hrs (8 processors) ... with
results better than all of our careful floorplanning and timing driven
placement in LSI's tools, including post placement repeater insertion
and IPO's. End result: 250 failing paths w/ -1ns of slack or better.
The only bad things to say about these surprising results is that Chip
Architect IPO attempts to try and fix the remaining failures would
only repeatedly crash. And we haven't had the chance to port the
placement back into the LSI tools, re-calculate timing, and re-analyze
the results to make sure the timing really is this good (although we
did on "Curly" - see below).
3.) "Curly" - 750K gates and 50 SRAMs @ 100/155MHz
Here was a design just beginning in the planning and layout stages
where we could really use the capabilities of Chip Architect. It
consisted of ~20 large sub-systems, making it ideal for hierarchical
layout. Furthermore (without going into too much detail), at this
point we made an observation that got rid of one of the real pains
normally associated with hierarchical layout -- developing the lower
level timing constraints.
Our synthesis methodology consists of bottom-up compile, PrimeTime
budgeting, and re-compilation. With this in mind, we noticed that the
block level budgeted scripts that Primetime outputs, of which we had
until this point used only the dcsh format for re-compilation, are
*almost* perfect for direct use (the ptsh format) as block level
placement constraints in Chip Architect. Only a little filtering was
needed to remove some unnecessary stuff.
So following this idea, we set up a top level floorplan, allocated
outlines for the lower blocks, and then kicked off runs that just
sourced the filtered ptsh scripts for constraints. We used GNU Make
and multiple Chip Architect licenses to run concurrently. And
runtimes for block level placement, global route, parasitic
extraction, and static timing were incredibly fast: 4 to 32 minutes
per block for blocks that ranged in size up to 90K gates.
The output of all this was placed/global-routed timing reports which
we compared side by side with the same Design Compiler reports.
And the results at the block level were very good: 14 met timing, and
6 with -3 ns or better of negative slack violations on the budgeted
I/O constraints. Then, using Chip Architect to global route the
inter-block nets, we generated a top level timing report ... and had
up to -11 ns of slack.
Analysis of the failing paths immediately showed the problem, long
inter-block wires. So again we tried the Chip Architect IPO features,
first at the block level, and then at the top. And again we had no
luck. Block level runs produced weird and inconsistant results,
sometimes ending up with worse than they started (?). Top level
attempts would only produce crashes.
So, rolling up our sleeves (we were commited to Chip Architect now,
we crafted a TCL script to (using the API) add repeaters along
the long wires ... and a week later had code that would produce a
design with only -0.75 ns of slack. Furthermore, since we had
overconstrained the synthesis and placement by 0.3ns, and had 0.8 ns
of clock uncertainty for skew and PLL jitter figured in, we now had
a placement which we felt was good enough to go into route with.
In fact, some poking around showed that the failing paths probably
could have been fixed pretty easily with some interactive upsizing...
but this was a trial with RTL that wasn't frozen, so we moved on.
With some trepidation we took the placement into the LSI tools and
re-analyzed, and came up w/ -0.25 ns of slack (different path). This
we considered to be excellent correlation when we remembered that
a) two different global routers
b) two different parasitic extractors and
c) two different delay calculators were used.
If you have read this far, you have gathered that we ran into some problems
with the Chip Architect. The worst thing is that its IPO features seem to
be pretty much useless in their current incarnation. The only good thingI
can say about them is that Synopsys seems to be aware of the situation and
has promised improvements in the next major release.
However, other than broken IPO, the few bugs we ran into have been minor and
had easy workarounds. I've been very impressed w/ the code stability, only
getting it to roll over & die when I really, really pushed it.
What has been a little disappointing, if not expected, has not been bugs but
miscellaneous annoying behavior. The worst example of this is an extremely
awkward logical vs. physical hierarchy separation. To place a design in
Chip Architect there cannot be any hierarchy, so if there was a logical
hierarchy it must must be flattened. Flattening our big 300K instances
design took ~10 hrs, which if added to the placement duration makes the
runtime go from excellent to very poor. In addition, if you were to make
netlist changes to the design, there is no way to write out a logical
netlist, so you end up with a cumbersome flat netlist -- which I am still
testing to see if it breaks other tools. Also, if you happen to need to
apply attributes/constraints/etc. to internal design nets/pins/cells, you
must maintain two sets: one for the hierarchical design & one for the flat.
Finally, the most disappointing thing to me about the tool to me is that it
appears Synopsys might not *let* it be as good as it *could* be. They have
consciously only integrated about 70% of Primetime into Chip Architect,
leaving out budgeting and misc. other features. It appears as if they even
intentionally hamstrung some commands. In addition, one of their current
recommendations is write files out of Chip Architect, use Floorplan Manager
for optimization, and import back into Chip Architect. The same goes for
budgeting and PrimeTime. Why!? I guess they need to protect their revenue
stream for Primetime and Floorplan Manager, but I for one (of course I am
not writing this check) would be happy to see Chip Architect take a PT orFM
license and not force me to have write & read back in 100's of MBs of files.
Anyway, I hope this info helps someone else who is out trying to make tool
decisions. Despite my gripes, I really like the tool and plan on using it
on all new projects. Although we haven't taped out anything with Chip
Architect yet, "Curly" is on the fast track to go and is definitely moving
faster than it would be without something like Chip Architect.
And I should mention, John, that whenever I have run into the inevitable
problems, the local (Boston) apps. and corporate engineering support for
Chip Architect has been excellent.
- Jon Stahl, Principal Engineer
Avici Systems N. Billerica, MA
|
|