( ESNUG 482 Item 8 ) -------------------------------------------- [06/30/09]

From: Joel Lach <joel.lach=user domain=3par bought palm>
Subject: GlassBoxes, ECOs, benchmarks, Talus Design (Blast Create II)

Hi, John,

Instead of Synopsys Design Compiler, we use Magma Talus Design (aka Blast
Create II) to do our RTL-to-gates synthesis on our data storage ASICs.  We
use it to do our frontend design and then hand off the netlist to a fabless
SoC company, Open Silicon, to do our backend P&R.

I wanted a hierarchical frontend synthesis flow, so I initially built a
custom flow using .lib/.lef.  It ran fast because those models are very lean
on information.  Unfortunately, it had maintenance/support issues, plus I
needed to do multiple steps by hand during the project and would always be
making several iterations of last minute tweaks to cell sizes in our netlist
to get a clean handoff.

After much discussion with Magma, they recently added a GlassBox abstraction
functionality as a standard part of Talus Design (Blast Create II) that used
to only be available in their Hydra hierarchical floorplanner.  GlassBoxes
offer several advantages to us in terms of runtime, timing accuracy, memory
requirements and closure with the back end group.


Talus Design (Blast Create II) benchmarks

Our design was 5 M gates in 130 nm TSMC at 250 Mhz.  The top consists of 6
different top level blocks instanciated a total of 26 times.  The largest of
the top level blocks had its own sub-hierarchy of 9 instances of 4 modules
which I refer to as A,B,C,D.

  Module  Instances   Sequential   Combinational   Complexity   Challenge
    A         6           8 K          42 K           high       medium
    B         1           8 K          30 K            low       medium
    C         1          60 K         120 K          medium       high
    D         1          16 K          24 K            low       medium

Here are the main runtime vs. accuracy choices available for hierarchical
synthesis with the June 2009 release of Talus Design.  I also have benchmark
data on each option.

  1. Sized netlist.  The "-size" option is a new option to fix-time to
     generate a netlist without a physical floorplan.  We hand off the
     sized netlist because our backend vendor, Open Silicon, requires a
     large timing margin.  The command for generating sized netlist is:

          fix time $m $l -size -effort low -timing_effort high

  2. Force Keep - allows you to mark the modules that you want left alone,
     and Talus Design will not touch the internals of that model.  This
     saves runtime because Talus Design just looks at top level timing
     paths, and ignores internal paths of design.  Since the models are
     still loaded into memory, they still have some impact on runtime and
     memory.

     The time it took to synthesize the modules for sized netlist and the
     force keep options are:

                    Module    Time     Memory

                      A      40 min    260 MB
                      B      30 min    260 MB
                      C      30 min    830 MB
                      D      30 min    260 MB

     Total time 2 hrs, 10 min.

  3. GlassBox - structural.  This option pretty much takes the original
     model and removes the information from it that is not required to
     close top level timing.  It trims out the guts and leaves the shell,
     for example the boundary flip-flops and logic.  GlassBox is a lighter
     weight version of the original model.  The more information the
     abstract can exclude, the faster Talus Design (Blast Create II) can
     close timing on top level paths and the smaller the memory footprint.
     At the end of the command there is information about what percentage
     of the model had been pruned away which I share below.

     GlassBox creation with -construction structural:

                Module       size reduction   creation time
                  A               49%              5 sec
                  B               60%             10 sec
                  C               46%             40 sec
                  D               33%             10 sec

The concepts of timing abstractions and hierarchical synthesis may seem
complicated, but they are actually easy to use.  We only need run a single
command to generate a GlassBox which surprisingly only takes about 30 secs.
The only input to Talus Design (Blast Create II) is the model is either
constrained or unconstrained and the logical library.  It is so quick, I
just run the GlassBox creation for all of the down models as a part of every
top level synthesis.

  4. GlassBox - slack_pruned.  When you generate a GlassBox the default
     abstraction mode is "structural" and does not take timing into account.
     You can get much leaner GlassBox models with the slack_pruned option.

     GlassBox creation with -construction slack_pruned:

                Module       size reduction   creation time
                  A               75%              5 sec
                  B               90%             10 sec
                  C               90%             40 sec
                  D               55%             10 sec

If you haven't done hierarchical design before, please note: GlassBoxes will
only be as correct as your constraints.  If you constrain all your pins as
false paths and don't declare any clocks you will have a very lean model,
but good luck explaining it later when the design fails static timing
analysis due to overly aggressive pruning.  Also the top level constraints
may try to access nodes in the module which don't exist as a GlassBox if the
constraints are not consistent with each other.

The basic Magma command for generating a GlassBox abstract is:

     run prepare GlassBox abstract $m1 -modeling timing \
                                       -construction slack_pruned

GlassBox also has a delay_cached command which is supposed to optimize the
model even further, but I haven't played with that option.  The basic set-up
is simple, and Magma offers a variety of commands if you choose to customize
the GlassBoxes for your design.

To use GlassBox with other tools or to further debug your abstracts, use the
"export verilog netlist" command on the model.


Performance results for different Talus Design options:

  Method                             Hierarchy        Time    Memory  Margin
  fix cell flat                      flat         12.5 hrs      3 GB    12%
  fix cell with 'force keep'         pseudo-hier    16 hrs    3.1 GB  -200%*
  fix time -sized flat               flat            5 hrs    2.5 GB    25%
  fix time -sized force keep         pseudo-hier   1.3 hrs**  2.3 GB    25%
  fix time -glassbox structural      true hier      50 min      2 GB    25%
  fix time -glassbox slack_pruned    true hier      25 min**  1.1 GB    25%
  fix time -sized with custom flow   true hier      10 min**  700 MB    25%

* Fix cell and Force Keep had such a terrible result because we constrained
  the tool to only use the provided cells in the down models, rendering this
  approach invalid.

** For these methods, you also need to add the time to synthesize each
   individual block, which I show above.  The sub-modules only need to be
   run once, and only need to be run again if the boundary timing changes
   significantly allow many quick top level iterations.

Note:

 1. Using GlassBox abstracts yielded the best overall results in terms
    of runtime/accuracy.

 2. With a combined Force Keep and Sized netlist approach, we got a
    significant performance improvement at the expense of memory
    utilization over a flat netlist.

 3. I posted "custom flow" results to show that the .lib representation
    is the leanest possible timing model you can create, which represents
    the best you can achieve performance wise.  Now that GlassBox abstracts
    are available in Talus Design, we don't plan to continue with our
    custom flow.  The GlassBox runtime speed was fast enough plus it is
    completely automated.


Our ECO process with Magma

Near the end of the design, things are converging: the cells names are not
changing much, and logical optimizations are not happening as aggressively.
Each ECO becomes a project in itself.  An ECO gets too large and the tough
decisions to scrap a block, restart, or purge features came into play more
often than we would like.  In our last project I took advantage of the fact
that both the frontend and backend design tool chain was Magma.

This is what we did for ECOs:

 1. The designer edits the netlist as needed with some naming
    conventions and constraints.

 2. I wrote a Perl script that does a diff between the modified
    netlist and the unmodified version using Magma commands like
    "data create cell", "data attach", "data detach".

 3. On ECO day, our back-end vendor (Open Silicon) sends us a current
    Volcano of the block and we do one final test on the modified
    netlist with Talus Design (Blast Create II) to make sure the design
    didn't diverge too far from before we sent over the ECO script.

 4. Our backend vendor (Open Silicon) runs our script to do any fixes
    as a result of the new connections and cells.

We got so confident in this process that we were delivering 200 gate level
patches right up until about a month before tapeout.  All this was possible
only because Talus Design and Talus Vortex shared the same database.  We
intentionally took on the ECO work in the front end because it gave us
certain liberties with the volumes of ECOs that Open Silicon was willing to
accept after they learned how seamless this process was.

As a result of our exploration of ECO's, Open Silicon now recommends to its
customers to have a Vortex or Hydra license in house.  There is a Magma
command which has a similar functionality called "run eco diff".  Here are
some key pieces of the scripting we used to do ECOs:

  #### Set up original and ECO models
  import volcano $volcano -object /work
  import netlist $eco_netlist -verilog -verbose -lib work_eco
  set m /work/${active_design}/${active_design}
  set m_eco /work_eco/${active_design}/${active_design}
  run bind physical $m_eco $l
  config hierarchy separator /
  data legalize names $m_eco
  force maintain $m_eco -heir
  data flatten $m_eco
  create_domain_on_eco_model $m_eco

  #### Determine what changed
  run eco diff $m $m_eco \
                    -eco_file ./input/changes.mtcl \
                    -ignore_cell_type \
                     "pad_power row_filler pad_filler pad_corner" \
                    -override_kept

   #### Implement the change
   fix eco $m $l


Room for improvements for Blast Create II:

 - As a user I like portable and standard formats and I would like to
   see Magma add the option "export lib" and "export lef" commands to
   Talus Design.  Instead, they are part of the place and route tools.

 - It is acceptable that Talus Design does not  have a complete Hydra
   level of functionality, but it would be big value add for "physical
   synthesis" to have some form of basic physical information at the
   top level.  For example, just a simple x and y geometry and pin
   location information retained in the GlassBox model.

 - A word of caution about using GlassBoxes: They are only abstractions,
   so it is important to check against the real models occasionally and
   for signoff, and make sure the model isn't hiding any issues or
   leaving any information out.   Compression of information always
   carries some risks and tools can always have bugs. .

My first experience with GlassBox models was indirect, as the physical
designers were using them.  Top level timing closure is generally very risky
because of the enormous amount of time it can take to get your first useful
result.  We were getting top level timing feedback within 2 weeks after
minimal physical design was complete for all of the top level modules as
opposed to the 2-3 months (or during tapeout) which I have experienced on
previous projects. 

With GlassBoxes inside Talus Design, we were able to do a complete design
restart (RTL-to-GDS) on a module that was instanced 14 times on our top
level 3 weeks before tapeout with only a 2 day schedule hit.

Additionally, having an all-Volcano ECO process significantly decreased our
turnaround time with fixes.  On previous projects it would generally take a
week to get a fix into the current database; on our recent project I was
getting 1 day turnarounds.

    - Joel Lach
      3Par, Inc.                                 Fremont, CA
Join    Index    Next->Item










   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)