( ESNUG 435 Item 3 ) -------------------------------------------- [12/08/04]
Subject: ( ESNUG 433 #1 ) Two Hands-On Users Report On Sierra Pinnacle
> What do you hear about Sierra Design Automation and their claims about
> Pinnacle handling far more gates overnight than Synopsys PhysOpt or
> Magma Blast Fusion? The numbers claimed are 10 M gates vs 2-3 M gates
> for Synopsys and Magma. What's the truth here?
>
> - Paul Wick
> Seligman Mutual Funds Palo Alto, CA
From: Kazuyuki Dei <kdei=user domain=fma.fujitsu spot calm>
Hi John,
We have been using Pinnacle for a few months now. We have run 9 designs
through Pinnacle, and are seeing very good results: high capacity, fast
runtime, and better quality results. Ours is a typical ASIC netlist
handoff model where we receive the netlist and SDC from our customers
and we take it through the physical implementation flow through P&R
(Cadence SoC Encounter and NanoRoute), extraction (Simplex Fire&Ice
and in-house), VoltageStorm for IR drop analysis, and PrimeTime for
final timing analysis with our internal delay calculator.
The following highlight our most recent design run through Pinnacle.
Design -- 1.4 M placeable instances run flat in Pinnacle, @ 200 MHz
Capacity -- 2.6 GB memory footprint in a 32-bit Linux machine to perform
placement, optimization, routing and timing.
We even loaded the entire routed database back into Pinnacle and did a full
post-route optimization on the design in a 32-bit Linux box as well. In
addition, during our initial trials we loaded and placed 2.3 M instances
(around 10 M equivalent gates) in less than 3.7 GB. Therefore, to answer
Paul's question, Pinnacle's feasibility might be close to 10 M gates
in a 32-bit Linux box for our designs, which is pretty impressive.
Speed -- 18 hours from netlist to detailed, optimized placement, including
full global routing done on a 3.2 GHz, 32-bit Pentium processor. Also,
the way Pinnacle performs optimization, it can tell you after about
5-6 hours roughly what the quality of results will be. Time to an answer
is very important to us as each iteration of the design in our existing
Cadence flow takes days.
Better quality results -- from the customer netlist handoff of this
design, we had very good timing results from Pinnacle within a few
iterations of the tool in a 2 day period. Our SoC Encounter flow did not
match the QoR of Pinnacle. The Cadence flow took weeks to get to their
best result and required a 64-bit machine to run the design.
Gotchas in Pinnacle:
One issue on this design was that the Clock Tree Synthesis engine for
Pinnacle was not yet released. This meant we had to go in and out of
the tool for this, which is kind of clunky. We have a couple of other
issues with Pinnacle which required workarounds:
a. Pinnacle doesn't support SDF, which is our traditional
customer deliverable.
b. Gate placement in Pinnacle doesn't check pin blockages for
pre-routes immediately above the pins, so we occasionally
have a few blocked pins.
c. Its power router is not full-featured, does not support
conformal ring structures and elaborate power structures.
Currently we are using SoC Encounter for our power creation.
d. We initially had problems with Pinnacle making modifications of
certain logic that we wished to remain untouched. These are
soft modules that will later be swapped with hard IP, but we
did not currently have LEF/lib models for. We were able to
generate workarounds for this by setting dont_modify properties
on these hierarchies, nets, and cells so that the Pinnacle
generated netlist would go through the rest of our flow.
e. We also had the problem of placement of critical cells relative
to I/O timing. Our I/O timing constraints were not 100%, so to
achieve good I/O timing, we had to write a custom flow to
pre-place flops for interfaces, to make sure Pinnacle met timing.
Pinnacle fits well into our existing Cadence tool flow as it uses standard
data formats Verilog/LEF/DEF/.lib/SDC/tcl. Also, Pinnacle has a flexible
architecture; we are running both standard cell and Structured ASIC
designs in the tool. In fact, Sierra was able to support our Structured
ASIC designs within 24 hours of receiving their first look at design data
with our architecture.
- Kazuyuki Dei
Fujitsu Microelectronics Sunnyvale, CA
---- ---- ---- ---- ---- ---- ----
From: [ Chicken Little ]
Hi John,
I must be anon for political reasons.
We recently did an eval using Sierra Pinnacle. Tools used:
Floorplan: SOC Encounter
Placement: PhysOpt(10%)/Pinnacle(90%)
CTS: SOC Encounter(20%)/Pinnacle(80%)
Route: NanoRoute
SI Check: internal
Formal Verification: Synopsys Formality
STA: PrimeTime
IR-drop: Simplex VoltageStorm
DRC/LVS: Mentor Calibre
The design we used for this eval was a 0.15 um 4.0 M gate (about
950,000 instances) with a core frequency of 367 Mhz. The main
purpose of this trial was to look at capacity and turn around time
it was also important to not sacrifice QoR for fast run times.
For the eval we ended up testing Pinnacle in several modes.
1. Rapid turn around time on small design.
We took one 80 k instance block from our design and ran it through
Pinnacle. It took Pinnacle less than 30 minutes. We ran this block
at 400 Mhz. Pinnacle had no trouble meeting this. We had a few long
paths with this block Pinnacle had no trouble optimizing these paths.
We gave the same constraints and floorplan to another tool. The
run time was 1 hour and it has unable to meet timing on these long
paths we were left with over 300 ps of negative slack.
2. Large Flat Design (~950K instances)
We gave the whole design to Pinnacle and it placed and optimized
it in less than 4 hours. We ran it on a 2.8 Ghz Linux box the
memory usage was only 2.2 G. The QoR has the same as the smaller
block. During the eval we ran 3 different netlists through Pinnacle
that varied in gate count from 750 k, to 1.1 k, to the final of
950 k gates. Through the design cycle we also worked through
missing constraints and other floorplan experiments. We were
happy to see that the run times were always less than 4 hours to
get feedback on new design changes.
3. Top-level design with physical hierarchy
We read in a fully routed DEF and Verilog for five sub blocks and
placed and optimized the top level logic (about 30K cells). No
timing models needed to be created for these sub blocks. The
Pinnacle timer would use the routed data from the sub blocks to
calculate the timing between the top-level and the sub blocks.
One of the sub blocks was used more than once, Pinnacle doesn't
require you to uniquify in this mode even though their are multiply
instances of the same sub block. This run took about 30 minutes.
4. Power and CTS
We had missed our power budget on this design and found the problem
was with the clock tree. We had to redo the clock tree to reduce
the power. By this time Pinnacle CTS tool has mature enough to give
it a test drive. We were able to reduce the buffer count and swap
out all registers for low power versions to save about 2 watts of
power. The skew was less than 150 psec which is what we were seeing
with the Cadence and Magma CTS tools we had used previously on this
design. We found a problem late in our design flow and this is where
having the rapid turn around time of Pinnacle really saved us time in
the schedule. We had our first test results from Pinnacle CTS in just
a few hours. Within 2 days we had resolved our power issue that
included re-running initial placement and optimization to swap out
registers to lower power flops, insert CTS, close setup and hold time
violations and get back to our verification flows.
5. Hold time fixing
For our 0.15 um technology we have 8 corners to run for hold fixing.
Pinnacle has able to run 6 of these. The other 2 are OCV corners
and require CRPR.
We were already committed to doing a hierarchical design so that is the
version we taped out. In the end Pinnacle did about 90% of placement
and 80% of the clock tree. The parts just came out of the fab this week.
The parts are passing test except for an at-speed test that needs tweaking.
- [ Chicken Little ]
|
|