( ESNUG 582 Item 1 ) ---------------------------------------------- [03/29/18]
Subject: After 16nm benchmark, 7nm user swaps out DC-Graphical for Genus-RTL
AN RTL SYNTHESIS TIPPING POINT?: Roughly 17 years ago, a new business
book came out, "The Tipping Point", that emphasized how little things
now can have a large impact on future events. It was sort of like the
old Sci-Fi "The Butterfly Effect" story; but for business. ...
What makes this interesting is since Innovus has been eating ICC2's lunch
in PnR over the past few years -- we might now be seeing a tipping point
happening in the RTL synthesis market, too. From the many user comments
below, it appears that Cadence Genus RTL, when paired with Innvous PnR,
is now becoming a credible threat to Aart's Design Compiler monopoly.
- from http://www.deepchip.com/items/dac17-04.html
From: [ "Oh, No! Godzilla!" ]
Hi, John,
What you said here in your DAC'17 "Best of" report is half true and half
wrong.
My group switched from ICC/ICC2 over to Innovus for 16nm PnR, but we kept
DC Graphical for our RTL synthesis. I'll write more about that later.
What I wanted to point out is where you're half wrong in your RTL synthesis
tipping point observation, John.
WHY WE HAD TO LEAVE DC-GRAPHICAL
This "tipping point" started happening at 16nm and is getting worse at 7nm
not because of bad ICC2 or better Innovus software. It started because of
device physics fundamentally changing as we move below 16nm.
The need for physical synthesis has been around since 90nm. It's been
given to us in small managable incremental changes. Below is a chart from
ITRS predicting wire interconnect RC delay vs transistor gate delay from a
couple of years ago.
Editor's Note: BEOL is a fab term for "Back End of Line". It means
"interconnect" to a chip designer. It the contact, routing, via, and
dielectric process layers. - John
At N16 a wire delay is 1,000X slower than a gate delay. Note that from 16nm
to 7nm there is an ADDITIONAL order of magnitude difference (now 10,000X!)
between wire delay and transistor delay. This extra jump in wire delay has
changed the classic physical synthesis problem from mostly predicting long
wires and high-fanout delays -- to now also accounting for small stuff like
via cuts for local interconnect. Even very short distance, low-fanout wires
are now a problem at 7nm.
It creates chicken-and-egg questions for our RTL synthesis tools.
"Exactly what is the optimal routing topology for logic mapping
and structuring now?"
Which is a tough question that has no good answer unless your RTL synthesis
is tightly coupled to your PnR.
OUR FIRST LOOK AT CADENCE GENUS-RTL STANDALONE
At 28nm we were an all-Synopsys house. VCS, Primetime, DC-Graphical, ICC,
Star-RC, IC Validator plus Mentor Calibre for golden sign-off. That worked,
but 28nm was the last planar node.
At 16nm FinFET came in we tried ICC/ICC2 and things begame to break in our
benchmarks. Also we heard lots of bad ICC2 stories. (See ESNUG 552 #6)
So after out internal 16nm FinFET benchmark, we swapped out ICC/ICC2 and
put Innovus in the PnR socket in our flow.
Curious, since we were already in Innovus then, we decided to also try a
first look Genus RTL at 16nm benchmark.
Now the question became: "how does Genus-RTL Physical handle our blocks?"
THE BIG BLOCK TEST
We focused Genus Physical on a monster 7.8M cell block, 16nm, 750 MHz that
was nicknamed "Godzilla".
It's techinically possible to divide "Godzilla" into 3 blocks, but it would
be unnatural -- with a lot of interconnect between the sub-partitions. One
of the blocks still needs to be 4M cells, which is pretty big.
The challenge here was to get a good physical synthesis result in under
1.5 days in order to have a predictable PPA through to PnR. Our rule of
thumb is that if physical synthesis takes more than 2 days for any design,
you run into bad schedule hits if will you have trouble getting convergence
on 2 or 3 runs. For example, our DC-Graphical runs were taking MUCH longer
than 2 days and we never saw it converge.
Genus-RTL Physical is partition-based synthesis that does both distributed
and multi-threaded optimization. It automatically breaks up your design
into partitions, carving up cones of logic to minimize reconvergent paths
between partitions. This is not using a top-down time budgeting which often
falls into sub-optimal results. (In contrast, DC-Graphical is only multi-
threaded and hits a speed improvement wall between 8 and 16 CPU cores.)
Compute
(machine x cores) Total Cores Runtime
----------------- ----------- ----------
1 x 8 8 51.0 hours
2 x 8 16 40.0 hours
3 x 8 24 31.5 hours
With a total of 24 cores, Genus-RTL Physical got a 4 hour per million cell
throughput and was within our 1.5-day runtime limit -- with 4.5 hours to
spare.
It also had a good prediction of pre-CTS Innovus placement optimization.
Innovus pre-CTS
Genus Physical place opto
-------------- ---------------
Worst Negative
Slack (WNS) -240 psec -195 psec
Total Negative
Slack (TNS) -120 nsec -90 nsec
Failing End
Points (FEP) 2,140 1,835
This TNS, WNS, and FEPs matched up well. Furthermore, 71 of the top 100
Genus FEPs were also in the Innovus top 100 FEPs. So Genus Physical was
working on the right optimization problems.
BENCHMARKING GENUS PHYSICAL VS. DC-GRAPHICAL
Since Genus Physical passed our "Godzilla" benchmark, we decided to see what
would happen if we swapped it in place of DC-Graphical inside our Innovus
PnR flow. The rest of the flow remained the same -- Cadence timing signoff
backend (Tempus, Quantus) and Mentor ATPG/scan test and Calibre physical
verification.
This time our design was a 2.5+ Ghz 16nm custom high-performance CPU plus
sub-system "Marx_Bros" which broke into 5 physical blocks ranging from
from 80,000 instances to 1.5 M instances.
Marx_Bros -------- Chico
|--- Harpo
|--- Groucho
|--- Gummo
"--- Zeppo
Our clock network is a hybrid mesh with local cluster connection for minimal
skew. Each of the blocks must meet the 2.5+ GHz target frequency, be hold
violation clean, and routing clean (low DRCs) with minimal standard cell
area as the winning criteria.
Our previous DCG-Innovus flow had significant path slack differences between
physical synthesis and PnR. We were able to meet our performance goals, but
we suspected the area may be sub-optimal as a result.
We set up the Genus-Innovus flow at the highest accuracy *extreme* physical
optimization. We set non-default rules (NDRs) on critical nets in Genus
Physical to lower resistance. Then we inserted test and passed the netlist
and NDRs to Innovus for PnR. This recipe gave us good critical path delay
predictability to pre-CTS (which after placement opt.)
DC-Graphical + Genus Physical + Genus area
CPU block Innovus area Innovus area reduction
--------- -------------- ---------------- ----------
Groucho 97.68 95.14 2.6%
Harpo 85.32 83.86 1.7%
Chico 71.21 68.29 4.1%
Zeppo 62.93 62.80 0.2%
Gummo 46.88 45.43 3.1%
This data above is after placement, through routing, through timing ECO
closure, and through DRC clean-up.
Genus Physical got a 2.34% average area reduction on our Marx_Bros CPU.
THE GENUS-SPYGLASS GOTCHA
We are updating our 7nm Genus-Innovus flow to do Multi-Mode/Multi-Corner
(MMMC) timing constraints along with SOCV delay variation for improved
timing accuracy. Genus Physical supports both, but we have run into some
trouble with constraint debug.
In our early design phase we see a lot of churn with RTL and SDC timing
constraints. We are in a hurry to see the PPA results through physical
synthesis and expect that if there is a basic problem with the constraints
Genus would give a clear alert at elaboration so we know how to fix this.
Genus isn't cutting it right now. We spent a lot of time trying to figure
out why there are big timing differences run-to-run only to find it's a
simple constraint error culprit. Our workaround is we must push everything
(both RTL and SDC) through the Synopsys Spyglass linter every time before
going into Genus. Having to go through any other tool is a pain. CDNS R&D
needs to port the Tempus checks into Genus Physical.
WE WENT ALL-GENUS SYNTHESIS AT 7NM
In the end, based on the "Godzilla" and "Marx_Bros" results we decided to
swap out DC-Graphical with Genus-RTL for some of our 16nm design flows.
Then after a couple of successful 16nm projects building confidence, we
100% switched over to Genus-Innovus at 7nm.
We haven't taped-out at 7nm yet, John, but when we do, I'll try to remember
to do a follow-up on this if I can.
- [ "Oh, No! Godzilla!" ]
---- ---- ---- ---- ---- ---- ----
Related Articles
Genus RTL synthesis gaining traction vs. DC is #4 of Best of 2017
CDNS Genus vs. MENT Oasys vs. SNPS DC Graphical synth at DAC'15
ICC2 patch rev, Innovus penetration, and the 10nm layout problem
Aart's SUE RIVALS policy backfires horribly on core SNPS patents
Join
Index
Next->Item
|
|