( ESNUG 401 Item 3 ) --------------------------------------------- [10/17/02]
Subject: ( ESNUG 400 #3 ) Tech Details Of The LSI Logic Monterey Tape-out
> The 10 additional Monterey tape-outs came from DAC 02 #15 plus a detailed
> Monterey tape-out you'll find in next week's ESNUG post.
From: Jim Jensen <jensenja@lsil.com>
Hi John,
We first starting working with Monterey at the beginning of last year. We
engaged with Monterey to develop a secondary / backup layout system at LSI.
Our primary layout system (FlexStream) runs off the Avanti Milkyway
database, and we use a combination of Avanti, LSI internal, and other third
party tools in the FlexStream system.
We've taped-out 1 real chip using Monterey's tools. The chip is a 3 million
gate Ethernet channel switch. It contains 700 K standard cells, ~80 RAMs,
and 2 clock trees - the larger one with 66K flops. Our approach with this
design was a flat methodology using Monterey's Dolphin system.
The initial floorplanning and some timing critical cell placement were done
using our FlexStream tools. The output from these tools was then taken into
Dolphin to do place and route, with the results being brought back into the
FlexStream flow for final backend checks and clean-up. Dolphin required a
lot of hardware for this design - the systems that we ran it on each had
8-12 CPUs and 19-24 GB RAM. We worked closely with Monterey over the next
few months, and in October, we were getting pretty good results out of
Dolphin. It was able to place the entire design flat and route it with only
36 routing violations and 2 antenna errors being detected by Dolphin. This
is very good considering that there are over 700,000 nets.
We had to run a bunch of ECOs to fix the clock tree because the clock
constraints did not completely define the balancing requirements of the
trees. Initially we had some very unbalanced trees that we had to fix, but
eventually we got the skew down to 0.7 nsec.
Dolphin works well when all constraints and initial conditions are set up
correctly and do not change. This being a real design, though, that wasn't
the case and it did cause some issues with the Dolphin flow. The netlist
had some changes, and there were some user errors in setting up the initial
floorplan. This resulted in our re-running the entire design all the way
through detailed route about 8 to 10 times.
We ran our sign-off DRC tools on the Dolphin results, and found that the
violations detected by Dolphin correlated very well with our DRC tools. For
the most part, DRC violations that were not caught by Dolphin were due to
setup errors or boundary conditions that it would not be able to detect in
any case.
At this point, we ran up against the wall in terms of schedule. Although we
had achieved very good results with Dolphin and our team was confident that
we could complete the design with Dolphin, the overall consensus favored
finishing the design with our existing flow because we are more familiar
with its ECO capability than Dolphin's. We decided to use the results that
we had achieved so far with Dolphin, and finish the design with a series of
ECOs using our established ECO flow. We felt that this would give us the
best chance of finishing the chip quickly.
Montery Issues
--------------
Here are some of the issues that we ran into along the way.
At the time this design was started, Monterey did not have a front-end tool
for Dolphin that would do initial floorplanning. We had to use our LSI
in-house tools, and come up with a way to transfer the data correctly into
Dolphin. For the most part this worked, but it was a detailed exercise
which led to many errors. Now Monterey has IC Wizard for floorplanning
from their acquisition of Aristo.
The flow with Dolphin is very restrictive with respect to making changes
during layout. It was very difficult to implement a change midway through
the process. Typically any change meant a restart.
User errors are going to happen. In an environment you are familiar with
errors will occur, but with the Dolphin layout we were using tools that
were new to everyone. Besides this, the floorplanning required some
advanced in-house techniques that required special handling. On several
occasions we had to restart the Dolphin layout due to errors in the initial
floorplan. Had the Dolphin tool been more flexible, some of these may not
have required a restart. An example is a section of logic that had pre-
routes. These were showing up in Dolphin shifted by a few microns from
where they should be, which caused severe routing problems. We tried to
fix these in Dolphin after the physical optimization, but were unable
to do it successfully.
We should have been analyzing DRC errors earlier. We didn't realize until
late in the program that Dolphin was not detecting certain DRC errors.
This turned out to be caused by not correctly setting up our technology
rules properly. It was not detected until we hit a point where we could
not afford any design restarts and had to fix existing problems with an
ECO approach. The end result was that we were forced into using in-house
tools to complete the routing, even though Dolphin was capable of doing it.
Over-constrained synthesis constraints caused problems with Dolphin, which
needs to see the timing constraints as they would be supplied to an STA
tool. The size of the initial timing constraint file caused problems for
the Dolphin tools also. However, Dolphin has a really useful feature called
Constraint Compression Technology that reduced the initial constraint file
by about a factor of 10X.
As mentioned before, Dolphin takes a lot of hardware. We asked them about
a multi-CPU, shared memory Linux port, but so far they haven't shown us the
roadmap. However, it's not clear that Linux would help much since we are
using 64-bit Dolphin code on a multi-CPU Sun server and Linux (on Intel
CPUs) is still a 32-bit platform. The multi-threading in Dolphin really
does a good job of utilizing all available hardware resources.
We spent a lot of time on the clock trees. For the largest clock, Dolphin
ended up with a skew of between 1.2 nsec and 1.3 nsec after detailed route.
The estimate prior to detailed route was 0.5 nsec. Essentially, this is
because on sparse designs such as this one, fast paths tend to speed up.
Dolphin over-estimates interconnect delay before final placement, but when
the design was routed, it turns out that the estimates were too optimistic.
We were able to manually identify portions of the clock where we could use
resizing and removal of delay buffers to correct the clock skew. We got
the skew on the largest clock down to 0.7 nsec. This was not good enough,
though, and we had to use our in-house tools to massage the clock tree to
get the skew down farther. Eventually the main tree was brought down to
~240 psec of skew.
At that time Dolphin didn't support "set_case_analysis" constructs. We were
able to get Monterey to implement this feature in time for us to use it.
The timing correlation between Dolphin and our sign-off delay calculator was
close, but there were still issues that had to be addressed through ECOs
even though Dolphin thought that timing was met. Some of these could be
traced to library issues or to setup errors (such as bad clock tree
definitions). Regardless, there were still setup/hold/ramptime issues that
needed to be fixed through an ECO process. These were not totally
unmanageable, but were more numerous than would have been the case with our
FlexStream tools which have gone through extensive correlation efforts.
Monterey is moving toward using OLA libraries, and this would resolve any
correlation issues. We were unable to try to use this feature on this
design, but what we have seen looks good.
What Needs Work
---------------
Dolphin needs a huge machine with as many processor (up to 12) and as much
memory as you can load on it.
Run times are long - about five days from reading the netlist to completing
final route.
At first, Dolphin crashed a lot. This improved as time went by.
Timing correlation before and after routing was not as good as what we were
hoping for, particularly on the clock signals.
Dolphin should have automatically done some of the things that we had to do
manually to fix the clock trees.
The reporting is not as helpful as it could be. For example, there is no
easy way to get a report of wire lengths for all nets, and we had to write
scripts to parse some of the clock reports in order to get any useful
information out of them.
What Works Now
--------------
The command script for Dolphin was very compact - about 220 lines to
initialize the design (effort knob settings, variable definitions, timing
library setup, read input files, and define scan chain) and 100 lines to
run Dolphin (effort knob settings, write reports and output files, detailed
routing regions).
The routing results that we got out of Dolphin were very good, less than 50
routing violations out of almost a million nets.
Antenna fixing during routing worked very well in Dolphin.
The multi-threading works very well. If you have sufficient HW resources,
you can get very good run times.
Dolphin was able to abstract a .lib timing model for an ARM core.
Dolphin's ability to distill the constraint file down to only the essential
ones is very useful for multi-million gate chips.
The command scripts are very compact which makes the system easier to use
and maintain.
The Monterey R&D team was very responsive fixing bugs overnight and, in some
cases, implementing new features as we needed them.
- Jim Jensen
LSI Logic Bloomington, MN
|
|