( ESNUG 482 Item 8 ) -------------------------------------------- [06/30/09]
From: Joel Lach <joel.lach=user domain=3par bought palm>
Subject: GlassBoxes, ECOs, benchmarks, Talus Design (Blast Create II)
Hi, John,
Instead of Synopsys Design Compiler, we use Magma Talus Design (aka Blast
Create II) to do our RTL-to-gates synthesis on our data storage ASICs. We
use it to do our frontend design and then hand off the netlist to a fabless
SoC company, Open Silicon, to do our backend P&R.
I wanted a hierarchical frontend synthesis flow, so I initially built a
custom flow using .lib/.lef. It ran fast because those models are very lean
on information. Unfortunately, it had maintenance/support issues, plus I
needed to do multiple steps by hand during the project and would always be
making several iterations of last minute tweaks to cell sizes in our netlist
to get a clean handoff.
After much discussion with Magma, they recently added a GlassBox abstraction
functionality as a standard part of Talus Design (Blast Create II) that used
to only be available in their Hydra hierarchical floorplanner. GlassBoxes
offer several advantages to us in terms of runtime, timing accuracy, memory
requirements and closure with the back end group.
Talus Design (Blast Create II) benchmarks
Our design was 5 M gates in 130 nm TSMC at 250 Mhz. The top consists of 6
different top level blocks instanciated a total of 26 times. The largest of
the top level blocks had its own sub-hierarchy of 9 instances of 4 modules
which I refer to as A,B,C,D.
Module Instances Sequential Combinational Complexity Challenge
A 6 8 K 42 K high medium
B 1 8 K 30 K low medium
C 1 60 K 120 K medium high
D 1 16 K 24 K low medium
Here are the main runtime vs. accuracy choices available for hierarchical
synthesis with the June 2009 release of Talus Design. I also have benchmark
data on each option.
1. Sized netlist. The "-size" option is a new option to fix-time to
generate a netlist without a physical floorplan. We hand off the
sized netlist because our backend vendor, Open Silicon, requires a
large timing margin. The command for generating sized netlist is:
fix time $m $l -size -effort low -timing_effort high
2. Force Keep - allows you to mark the modules that you want left alone,
and Talus Design will not touch the internals of that model. This
saves runtime because Talus Design just looks at top level timing
paths, and ignores internal paths of design. Since the models are
still loaded into memory, they still have some impact on runtime and
memory.
The time it took to synthesize the modules for sized netlist and the
force keep options are:
Module Time Memory
A 40 min 260 MB
B 30 min 260 MB
C 30 min 830 MB
D 30 min 260 MB
Total time 2 hrs, 10 min.
3. GlassBox - structural. This option pretty much takes the original
model and removes the information from it that is not required to
close top level timing. It trims out the guts and leaves the shell,
for example the boundary flip-flops and logic. GlassBox is a lighter
weight version of the original model. The more information the
abstract can exclude, the faster Talus Design (Blast Create II) can
close timing on top level paths and the smaller the memory footprint.
At the end of the command there is information about what percentage
of the model had been pruned away which I share below.
GlassBox creation with -construction structural:
Module size reduction creation time
A 49% 5 sec
B 60% 10 sec
C 46% 40 sec
D 33% 10 sec
The concepts of timing abstractions and hierarchical synthesis may seem
complicated, but they are actually easy to use. We only need run a single
command to generate a GlassBox which surprisingly only takes about 30 secs.
The only input to Talus Design (Blast Create II) is the model is either
constrained or unconstrained and the logical library. It is so quick, I
just run the GlassBox creation for all of the down models as a part of every
top level synthesis.
4. GlassBox - slack_pruned. When you generate a GlassBox the default
abstraction mode is "structural" and does not take timing into account.
You can get much leaner GlassBox models with the slack_pruned option.
GlassBox creation with -construction slack_pruned:
Module size reduction creation time
A 75% 5 sec
B 90% 10 sec
C 90% 40 sec
D 55% 10 sec
If you haven't done hierarchical design before, please note: GlassBoxes will
only be as correct as your constraints. If you constrain all your pins as
false paths and don't declare any clocks you will have a very lean model,
but good luck explaining it later when the design fails static timing
analysis due to overly aggressive pruning. Also the top level constraints
may try to access nodes in the module which don't exist as a GlassBox if the
constraints are not consistent with each other.
The basic Magma command for generating a GlassBox abstract is:
run prepare GlassBox abstract $m1 -modeling timing \
-construction slack_pruned
GlassBox also has a delay_cached command which is supposed to optimize the
model even further, but I haven't played with that option. The basic set-up
is simple, and Magma offers a variety of commands if you choose to customize
the GlassBoxes for your design.
To use GlassBox with other tools or to further debug your abstracts, use the
"export verilog netlist" command on the model.
Performance results for different Talus Design options:
Method Hierarchy Time Memory Margin
fix cell flat flat 12.5 hrs 3 GB 12%
fix cell with 'force keep' pseudo-hier 16 hrs 3.1 GB -200%*
fix time -sized flat flat 5 hrs 2.5 GB 25%
fix time -sized force keep pseudo-hier 1.3 hrs** 2.3 GB 25%
fix time -glassbox structural true hier 50 min 2 GB 25%
fix time -glassbox slack_pruned true hier 25 min** 1.1 GB 25%
fix time -sized with custom flow true hier 10 min** 700 MB 25%
* Fix cell and Force Keep had such a terrible result because we constrained
the tool to only use the provided cells in the down models, rendering this
approach invalid.
** For these methods, you also need to add the time to synthesize each
individual block, which I show above. The sub-modules only need to be
run once, and only need to be run again if the boundary timing changes
significantly allow many quick top level iterations.
Note:
1. Using GlassBox abstracts yielded the best overall results in terms
of runtime/accuracy.
2. With a combined Force Keep and Sized netlist approach, we got a
significant performance improvement at the expense of memory
utilization over a flat netlist.
3. I posted "custom flow" results to show that the .lib representation
is the leanest possible timing model you can create, which represents
the best you can achieve performance wise. Now that GlassBox abstracts
are available in Talus Design, we don't plan to continue with our
custom flow. The GlassBox runtime speed was fast enough plus it is
completely automated.
Our ECO process with Magma
Near the end of the design, things are converging: the cells names are not
changing much, and logical optimizations are not happening as aggressively.
Each ECO becomes a project in itself. An ECO gets too large and the tough
decisions to scrap a block, restart, or purge features came into play more
often than we would like. In our last project I took advantage of the fact
that both the frontend and backend design tool chain was Magma.
This is what we did for ECOs:
1. The designer edits the netlist as needed with some naming
conventions and constraints.
2. I wrote a Perl script that does a diff between the modified
netlist and the unmodified version using Magma commands like
"data create cell", "data attach", "data detach".
3. On ECO day, our back-end vendor (Open Silicon) sends us a current
Volcano of the block and we do one final test on the modified
netlist with Talus Design (Blast Create II) to make sure the design
didn't diverge too far from before we sent over the ECO script.
4. Our backend vendor (Open Silicon) runs our script to do any fixes
as a result of the new connections and cells.
We got so confident in this process that we were delivering 200 gate level
patches right up until about a month before tapeout. All this was possible
only because Talus Design and Talus Vortex shared the same database. We
intentionally took on the ECO work in the front end because it gave us
certain liberties with the volumes of ECOs that Open Silicon was willing to
accept after they learned how seamless this process was.
As a result of our exploration of ECO's, Open Silicon now recommends to its
customers to have a Vortex or Hydra license in house. There is a Magma
command which has a similar functionality called "run eco diff". Here are
some key pieces of the scripting we used to do ECOs:
#### Set up original and ECO models
import volcano $volcano -object /work
import netlist $eco_netlist -verilog -verbose -lib work_eco
set m /work/${active_design}/${active_design}
set m_eco /work_eco/${active_design}/${active_design}
run bind physical $m_eco $l
config hierarchy separator /
data legalize names $m_eco
force maintain $m_eco -heir
data flatten $m_eco
create_domain_on_eco_model $m_eco
#### Determine what changed
run eco diff $m $m_eco \
-eco_file ./input/changes.mtcl \
-ignore_cell_type \
"pad_power row_filler pad_filler pad_corner" \
-override_kept
#### Implement the change
fix eco $m $l
Room for improvements for Blast Create II:
- As a user I like portable and standard formats and I would like to
see Magma add the option "export lib" and "export lef" commands to
Talus Design. Instead, they are part of the place and route tools.
- It is acceptable that Talus Design does not have a complete Hydra
level of functionality, but it would be big value add for "physical
synthesis" to have some form of basic physical information at the
top level. For example, just a simple x and y geometry and pin
location information retained in the GlassBox model.
- A word of caution about using GlassBoxes: They are only abstractions,
so it is important to check against the real models occasionally and
for signoff, and make sure the model isn't hiding any issues or
leaving any information out. Compression of information always
carries some risks and tools can always have bugs. .
My first experience with GlassBox models was indirect, as the physical
designers were using them. Top level timing closure is generally very risky
because of the enormous amount of time it can take to get your first useful
result. We were getting top level timing feedback within 2 weeks after
minimal physical design was complete for all of the top level modules as
opposed to the 2-3 months (or during tapeout) which I have experienced on
previous projects.
With GlassBoxes inside Talus Design, we were able to do a complete design
restart (RTL-to-GDS) on a module that was instanced 14 times on our top
level 3 weeks before tapeout with only a 2 day schedule hit.
Additionally, having an all-Volcano ECO process significantly decreased our
turnaround time with fixes. On previous projects it would generally take a
week to get a fix into the current database; on our recent project I was
getting 1 day turnarounds.
- Joel Lach
3Par, Inc. Fremont, CA
Join
Index
Next->Item
|
|