( ESNUG 545 Item 1 ) -------------------------------------------- [11/20/14]
Subject: User goes full CDNS EDI/Tempus/GigaOpt/GigaPlace/CCopt/ECO/synth
> In our eval we used a 13 M cell inst design. This represented a typical
> size for us at 28 nm, but we expect designs to get ~3X to ~5X larger at
> the next node. In our EDI/PT flow we typically use 4 CPUs for analyses.
> To be a fair comparision, we only used 4 CPUs to evaluate Tempus -- even
> though it claims it can run on any number of CPUs. We found:
>
> - single scenario w/o SI 4 CPUs: runtime = 74 mins; memory = 15.36 GB
> - single scenario w SI 4 CPUs: runtime = 89 mins; memory = 16.21 GB
>
> The Cadence marketing guy claimed Tempus on one machine gets throughput
> of 15-20 M cells/hour using 8 CPUs. We found 10 M per hour with 4 CPUs.
> Scaling to 8 CPUs makes the Cadence claim reasonable.
>
> The most important thing for us was EDI/Tempus had cut down our analysis
> runtimes in half vs. our EDI/PT flow -- and this was with just 4 CPUs.
>
> - from http://www.deepchip.com/items/0537-09.html
From: [ Ling-Ling from Drawn Together ]
Hi, John,
Please keep me anonymous.
While that earlier guy benchmarked EDI/PrimeTime vs. EDI/Tempus, I want to
share what using a EDI/Tempus/GigaOpt/GigaPlace/CCopt/ECO/RC flow is like.
We've deployed EDI with Tempus as our mainstream P&R/STA environment.
Yes, we drank the Kool Aid and went 100% CDNS backend.
To get decent 28 nm PPA for "jelly bean" blocks is fairly straightforward.
The harder stuff -- things like ARM and DSP cores, or high-bandwidth memory
subsystems, etc. -- before had been finely tuned in our ICC1/PrimeTime
flow. It took some time to get equivalent or better PPA from similar
effort with EDI/Tempus.
Cadence has been hungry to prove their mettle in this market. At this
point from what we've seen, EDI produces equal or better results than
ICC1 -- and Cadence continues to be very aggressive in their EDI/synth
and timing closure R&D.
What we found:
- GigaOpt assigns critical wires to upper-level metal or NDRs early
in the flow and EDI maintains them through the flow. (Yep, lots
of things there to get right: avoid over-subscription, understand
congestion, promote/demote properly etc.)
We saw multiple paths with 1000 um of wires assigned to upper-level
metals being driven by large buffers and hitting 50-150 psec delays
from placement through to route.
For us, for most paths in chips below 28 nm, wire routing and the
optimization of wires is more importan than gate optimization.
- The only gap here is that CDNS RTL Compiler synthesis doesn't yet
usefully understand how to use NDRs to solve timing problems when
mapping/structuring/etc. We think this capability with be a killer
when it's available.
- The new EDI GigaPlace global placer works much better out-of-the-box
with much less investment in regions and bounds. With ICC1, we needed
a large number of regions to attain QoR -- especially in some of our
high-performance cores and memory subsystems.
- In our architectures, pipeline registers are required to transport
signals at high speed across our chip. We were pleased that the new
EDI naturally handles placement of pipeline regions. It's not perfect,
but our designers have been freed from having to create bounds for
nearly every pipeline register required for transport.
- For CTS, the native Azuro CCopt worked well. It did CTS combining
the benefits of pre-route layer-aware optimization and useful skewing.
The previous standalone external Azuro executable was not able to get
an accurate picture of which routes were promoted to higher layers.
This new native integration of the Azuro engine solves that problem.
- CDNS RTL Compiler synthesis worked well with the new EDI when we
exposed LVT cells early to both tools in the flow.
True WNS paths were able to get better optimized and avoid creating
bottlenecks and we can use leakage recovery aggressively in the flow
to control Vt utilization.
We could close one of our highest-performance memory subsystems using
only 5% LVT and 13% SVT. We did this before Tempus-ECO capabilities
were available. Shows that structure wasn't adversely impacted by
early exposure of low-Vt devices.
- The new Tempus-ECO gives predictability. Using it has been positive.
Previously, we used both internally developed and PrimeTime-ECO. With
both old flows ECO predictability was iffy in highly dense regions.
We were forced to limit what types of optimization tricks that could
be deployed and both old ECO flows often put us into "ECO churn".
Because Tempus-ECO is physically aware, it gives us predictability.
It's more than just reading LEF. From what we've seen, it does full
placement legalization and checking. It does aggressive optimization
tricks -- more than just sizing and buffer insertion. While the full
suite of ECO tricks are not yet there (e.g., wire/layer assignment,
complex remapping, etc.), what is there is a leap forward vs. what we
could do in the past with either our custom tooling or PT-ECO.
This is from using EDI/Tempus 13.2 and 14.1. I suspect others using the
more current 14.2 rev which came out last week are seeing even more.
- [ Ling-Ling from Drawn Together ]
---- ---- ---- ---- ---- ---- ----
Related Articles
A user's first hands-on benchmark of EDI/PrimeTime vs. EDI/Tempus
GlobalFoundries shows CDNS Tempus-timed 28nm ARM Cortex A12 chip
Cadence Tempus vs. Synopsys PrimeTime as #1 hot tool at DAC'13
Join
Index
Next->Item
|
|