( ESNUG 383 Item 1 ) -------------------------------------------- [11/28/01]
Subject: ( ESNUG 376 #3 ) Four 80 Mhz 0.25/0.35 Cadence PKS/SE-PKS Tapeouts
> My summary is PKS works. It has good correlation with silicon and can
> swallow large designs. Interfacing with Silicon Ensemble is a no-brainer.
> It's bleeding-edge but showing some signs of maturity: we taped out with
> version SPR4.07 but couldn't have done it using SPR4.06. Proof: the
> silicon is in manufacturing ramp up.
>
> - Geoff Smith
> Cisco Systems Toowong, Australia
From: "Ching Hsiang Yang" <yjs@sunplus.com.tw>
Hi, John,
It has been fun reading ESNUG through these years. ESNUG has been very
helpful in getting the REAL story out of each tool. Here's our Cadence
Ambit-RTL/PKS/SE 4 chip tapeout story. I am hoping we can push vendors to
provide us better solutions. Everything was done with the Cadence tool set
except for some sub-modules on 2 chips infected by Synopsys DC.
Background :
We've been using Ambit BuidGates since *before* their merger with Cadence.
And we are one of the rare companies to own both Avanti Apollo and Silicon
Ensemble for P&R. Before using PKS, we had been using Ambit BuildGates
for logic synthesis for 3 years, so migrating to PKS was quite easy for us.
We have Synopsys DC, too. It's also easy for us to translate DC scripts
into the Ambit environment either manually or by Ambit's translator. (The
translator translates DC's "write_script" results into Ambit code. Not
everything gets translated, but it's pretty close.)
Phase I :
Before committing to PKS, we tried to use it to re-design our previous
tapeouts as a test case. These designs were done with a conventional
IPO-ECO flow either in SE or in Apollo. (The four combinations were:
Ambit+Apollo, Ambit+SE, DC+SE, and DC+Apollo.) It was definitely NOT push
button work. When we started to evaluate PKS, ourlocal (Taiwan) Cadence AE
was also new to PKS. We worked together to bring it up after almost 6
months. It was painful to get it work at the beginning because we were
running PKS without Cadence HQ support. We didn't get much attention then
simply because we are not as big as other Cadence customers. We were so
small, Cadence even voided from their PCR system a PKS bug we had reported!
Then we figured if we're not big, we'll try to be first. For such new flow,
you can not succeed until you get supported by the Cadence R&D core team.
All but one of the re-run chips achieved one-pass timing closure with PKS.
The timing reported from PKS is NOT consistent with our golden flow: Avanti
StarRCXT+Celestry MDC, but it's reasonably close (within 0.5 nsec.) We did
not bother to get them closer because they used different timing engines.
Phase II :
After suffering through Phase I, we decided to tapeout our first PKS chip
in April. After that, we had the confidence to use it as the standard flow
on our next 3 chips over the following 6 months. Described below is our
1st PKS tapeout case (0.25 um). The flow of our other 3 chips (1 at 0.25 um
and 2 at 0.35 um) was exactly the same except that our 1st run was with
PKS 3.0 -- we are using PKS 4.0 now.
Chip profile : over 40 hard macros (Analog, RAMs, ROMs, PLLs)
Components : around 200 K instances
Internal Clock Rate: several clocks; main clock was 81 MHz
Process: major fab in Taiwan, 0.25/0.35 um Artisan std cell, 1P5M process
Simulation : Cadence NC-Verilog 3.2
DFT : Syntest TurboScan
Synthesis : some sub-blocks were designed with Synopsys DC but most are
generated with Ambit BuildGates.
Final STA : Ambit BuildGates + PrimeTime (Each has their own good and
bad sides)
Formal verification : Verplex LEC
FloorPlan and Power Plan : Cadence SE Ultra
Chip Implementation (Physical) : Ambit PKS + SE Ultra (Cadence SP&R)
RC Extraction : Avanti Star-RCXT
Delay Calculation : Celestry MDC
Reoptimization after routing : Ambit PKS (with many iterations!)
Physical Verification : Mentor Calibre + Cadence Dracula for DRC/ERC/LVS
Layout polygon editing after P&R : Novas Laker
In running this PKS flow we encountered these problems:
(1) Different timing engines produces different timing data. While we saw
positive slack in SE-PKS, we found it become negative slack after
extraction and delay calculation. This is basically caused by
different extraction technology files (HyperExtract/Ambit in SE-PKS
and Star-RCXT+MDC). The difference is as large as 0.5 nsec sometimes.
We overcame this by over-constraining PKS. This works well in the
whole flow. How much you should over-constraint the design can be
judged from 2 or 3 iterations. And this value can be re-used in other
projects with the same technology. We have seen good correlation
between several projects since then.
(2) Within the Cadence system (PKS, after routing), the timing was very
close. Basically the timing reported by PKS is trustable if
you don't have highly congested areas in your placement. You must be
sure to solve these congested areas if you use PKS. They can produce
unmatched timing data after routing requiring massive search & repair
activities.
(3) Bad logic from Ambit BuildGates. Yes, Ambit-RTL produced bad logic in
our case. We caught the bugs in Verplex. The problem did not stop us
from using Ambit because DC is too slow and area consuming. It's a
trade-off. We can handle the bug instead of delaying the chip and
getting larger area.
(4) Power Plan in SE-PKS is lousy compared to Avanti Apollo. Cadence has
said they will improve it, but they don't know when.
(5) Floor Planning can not be done in PKS. New version of PKS still can
not do I/O plan. We have to use SE to create our Floor Plan and then
output DEF to PKS.
(6) CTGen clockTree synthesis is too slow. In our case, we had to wait
6-7 hours to complete single run in Sun Ultra60. (Using a SunBlade
1000 750 MHz can cut the time to 3-4 hours.) CT-PKS in PKS4.0 is
even slower!! Again, we are waiting Cadence to improve it but we have
started to evaluate Celestry ClockWise now.
(7) We had to manually do in-place size-up for some cells in PKS. For an
unknown reason, PKS just refused to size up cells to get better timing.
We have to do that by hand.
Overall, we used SE-PKS to tapeout 4 chips since April/2001. One chip is in
sample delivery stage and two others are silicon verified. The 4th is just
wafer'ed out and under system board test. I would say PKS is pretty stable
now. We usually spend our time trying to better constrain designs for PKS.
One key benefit from using Ambit-RTL is the runtime and STA capability over
Synopsys DC. We don't need to switch between DC <=> PrimeTime to do
optimization and STA. Within <pks-shell>, we can start RTL synthesis down
to routed DEF out and ready for Avanti Star-RCXT extraction and delay
calculation (Celestry MDC).
I think the main issue we need Cadence to improve is runtimes. CTGen and
CT-PKS are slow. PKS is kind of slow, too. To run PKS, we were forced to
buy more expensive machines like SUN Blade 1000's and HP C3700's. (You may
be interested to know HP machine is much faster than SUN.) Linux Ambit/PKS
or even the whole SP&R tool set in Linux is another thing we keep on request.
We have tested Synopsys DC. DC runs even 1.3X faster with Linux PC's (AMD
1.2G, ASUS MB, 1.5G MEM) compared with SUN Blade 1000 (750MHz) so I think it
is expectable Ambit or PKS can run faster in Linux.
Power planning capability is weak in SE-PKS. Cadence should improve that.
- Ching Hsiang Yang
Sunplus Technology HsinChu, Taiwan
|
|