( ESNUG 570 Item 1 ) -------------------------------------------- [04/04/17]
Subject: How Samsung uses Oasys-RTL inside their IC Compiler II flow
Mentor Oasys-RTL -- are Wally's answer to Aart and
Anirudh in synthesis. Oasys-RTL does crazy fast RTL
synthesis floorplanning, design partitioning, congestion analysis and
pre-CTS opto. 3 hours to synth & floorplan a 2 million inst chip
using 4G of machine memory. Can do 6 M inst flat. (ESNUG 560 #6)
Synthesized and floorplanned a 5.2 M inst customer chip flat, 16nm
TSMC, in 7 hours 48 minutes on a single thread. Does "place first"
where RTL is synthesized into a virtual physical partition and
optimized at RTL level. "Lets us do various What-If's that were
impossible before!"
- Cooley's DAC'16 Cheesy Must See List
From: [ John Cooley of DeepChip.com ]
Seeing that the Mentor U2U'17 is happening this morning, plus the fact that
MENT Sierra/Oasys has made the news lately (ESNUG 568 #1) I figured out this
would be a great time to publish my Sierra/Oasys U2U notes from last year.
(Just under the deadline, I'd say!)
FOR THE RECORD: the Samsung engineer only talked about "the incumbant tool";
at no point did he mention any Synopsys tool by name. So what follows is
what my brain heard him say.
The Samsung guy started by describing how Synopsys DC-Topo wasn't working
well enough for them inside their ICC Compiler II environment. Basically
DC-Topo was making low quality floorplans for ICC2.
The old Samsung all-SNPS floorplanning flow:
1.) run DC-Topo on RTL
2.) drop your gate netlist into ICC2
3.) build floorplan in ICC2 in gates.
4.) then DEF floorplan back to DC-Topo for physical synthesis.
5.) then take the gates and DEF back into ICC2 for PnR
6.) iterate back to 1.) until timing/routing converges
On 3 chips (1 at 28nm and 2 at 14nm) they saw these problems:
- DC-Topo RTL synthesis runtimes were getting longer (up to 1 week for
just one iteration) because it takes back and forth iterations between
RTL synthesis and ICC2 PnR.
- For RTL floorplanning, DC-Topo/DC-Graphical/DC-Ultra are fairly weak.
Even the SNPS guys will admit that in gate floorplanning CDNS Encounter
beats them and for RTL FP, Oasys beats them. Generating a production
quality floorplan using SNPS-only tools involved multiple iterations
between frontend and backend teams (around 6 weeks).
There was talk of chasing/creating/moving "seed" floorplans around.
Notice in the pic how their old DC-Topo/ICC2 floorplanning takes 6 weeks to
converge on a floorplan. The three ways Samsung evaled Oasys-RTL:
- Quality of results - Correlation within 10% timing and 5% area of ICC2
- Runtime/Capacity - at least 3M instance blocks and faster than DC-Topo
- Ease of use - had to work with ICC2.
WARNING: Oasys-RTL marketing claimed top level partitioning and budgeting;
full physical RTL synthesis; and ability to generate floorplans during RTL
synthesis. Samsung only tested Oasys's RTL level floorplanning flow
which had:
- automated RTL level floorplan creation,
- congestion and timing analysis (based on its own floorplan),
- and how its timing and congestion correlated with Aart's ICC2 PnR.
One thing that surprised me was Oasys-RTL can do multiple recipes at the
same time? Like
50% util, 0.8 aspect, 600 Mhz, 0.8 Vdd lib, high-VT/std-VT/no low Vt's
55% util, 0.8 aspect, 650 Mhz, 1.0 Vdd lib, use all Vt's
60% util, 1.0 aspect, 700 Mhz, 1.1 Vdd lib, use all Vt's
all at the same time. It was unclear if this was an Oasys feature, or if
this was Samsung using LSF farms running multiple licenses of Oasys-RTL.
(I'm thinking it was an Oasys feature.)
Another thing the Samsung guy seemed to like was the cross probing between
RTL & gates in Oasys. It showed where the RTL problems popped up.
3rd Party STA == Primetime
3rd Party RTL Synthesis == DC-Topographical
3rd Party P&R == ICC/ICC2
The Samsung guy described Oasys as "plug and play" (at least with ICC/ICC2)
that helped them get to their final ICC2 PnR production results quicker.
Again lots of talk about "seed" floorplans.
Even though the Samsung guy's Oasys/ICC2 flow diagram was confusing at
first, I pretty much figured out it was:
1. Feed your Verilog RTL into Oasys-RTL for floorplanning
2. Oasys generates seed DEF floorplan to DC-Topo for physical synthesis
3. Then gates and DEF go back to ICC2 for PnR
This flow was 2 weeks - as compared to the all-SNPS flow taking 6 weeks.
---- ---- ---- ---- ---- ---- ----
Overall Samsung found using Oasys-RTL on 3 chips:
Design 1 - 14nm, 2 million instance block. Checking basic floorplan
functionality. Oasys delivered a first cut "seed" floorplan with
macro placement in 6 hours -- 3x faster than DC-Topo
Design 2 - 14nm, GPU core with 3 Million instances. Timing correlation
was the criteria. Samsung saw 4% timing correlation with ICC2. 8 hours
runtime.
Design 3 - 28nm, 3.8 million instances, 400 macro design. Here they
tested its floorplan generation; finding timing and congestion hotspots
in RTL; and then timing correlation with P&R. This run completed in
12 hours delivering 5% correlation with ICC2 PnR and 5% less area from
floorplan exploration.
On one of their key projects (he wouldn't say which) Samsung cut their
project cycle time by 1 month using Oasys in their ICC2 flow. Oasys is now
part of Samsung's internal design flow methodology.
- John Cooley
DeepChip.com Holliston, MA
---- ---- ---- ---- ---- ---- ----
Related Articles
How Samsung uses Oasys-RTL inside their IC Compiler II flow
Nvidia uses MENT Nitro physical floorplanning for ICC/ICC2
ST used Nitro-SoC for full PnR on ultra-low power IoT chips
Join
Index
Next->Item
|
|