( ESNUG 338 Item 1 ) --------------------------------------------- [12/3/99]

From: Jon Stahl <jstahl@avici.com>
Subject: A Customer Trys The New Chip Architect Tool On 3 LSI Logic ASICs

Hi John,

I was very interested to read the Flexroute and lately PhysOpt reviews.  A
few months ago, I and others at my company didn't really have a lot of
interest in physical design tools from Synopsys.  However after a decision
to move to third party layout tools, we decided to explore all the options
out there -- not just the standard Cadence and Avanti offerings.  And after
an eval of Chip Architect, we have been pleasantly surprised and have become
early adopters.  I thought there might be interest in hearing a user review.
All of our designs use LSI Logic as the foundry.

We decided to look at Chip Architect because of certain key features.  Like
a lot of other folks these days, we also believe that hierarchical layoutis
the way to go.  Chip Architect promised a natural way to do this, starting
not only at the gate level, but the ability to do planning at the black box
and RTL level, too.  The tight integration between placement and timing that
Chip Architect promised really got our attention.


The Chip Architect Design Flow
------------------------------

Before I critique Chip Architect itself, I felt it would be best to give a
thumbnail schetch of how its flow works.  It's a challenging task to comeup
with recipe book style steps for the Chip Architect flow, but here is my
attempt at a general flow.  I think this is OK as a first level guideline.

  1. Create Quick Timing Models (QTMs) for any soft black box blocks
     (for which RTL is not available), if any, in PrimeTime.  We designed
     in Verilog and used VCS for simulation.   We used an old copy of
     VeriLint and VERA to verify the Verilog RTL.

  2. Read hierarchical structural netlists into Chip Architect.  Could bea
     combination of black box blocks, hard-macros, RTL blocks, Gate level
     blocks.  For memories, we used LSImemory to generate synlibs.  We
     read the synlibs into DC, did an update_lib, and wrote out mem.db's
     for each mem macro we used.  Imported them into Chip Architect.

  3. Read top-level constraints (say top_const.tcl), timing models of 
     IP/hard-macros, and QTMs in Chip Architect.

  4. Floorplan Hierarchically in Chip Architect:

     a) Size black boxes.  Here is a TCL script to size a black box,
        using approximate gate count:

     # Size a selected black box, based on gate count. 
     # Assuming 30u sq area per gate

     proc Size_BB {size} {
     reshape_object -def "0 0 [expr sqrt($size*30)] [expr sqrt($size*30)]"
     [get_sel]
     }

     b) Manipulate physical hierarchy (flatten, merge, etc.).  Hierarchy
        browser window in Chip Architect is great tool for this.

     c) Perform Automatic block placement in Chip Architect

     d) Do Power Bus planning, Create blockages, Pin assignment.  We did
        all of our power designing in LSI's layout editor because it was
        a 4 metal layer design.  Usually you only worry about power
        conflicts if your power layer is on the same layer as your cell
        interconnect layer.  I kept our power stuff in metal 3 & metal 4.

     e) Coarse Routing in Chip Architect

     f) Perform Std cell placements within blocks, and coarse route
   
  5. Analyze for Timing and Congestion in Chip Architect.  Use Chip
     Architect's built-in PrimeTime engine for Timing Analysis.  Use
     congestion map utility within Chip Architect for congestion.

  6. Tweak as necessary.  Some alternatives (depending on the violations)
     are:

     a) Refine floorplan in Chip Architect (resize, move blocks, add
        blockages and so on)

     b) Re-run pin assignment, placement, coarse routing with higher effort
        options.

     c) Perform top-level route in FlexRoute.

     d) Perform In-Place Optimization (gate sizing, buffer insertion) in
        Chip Architect.  (I couldn't get this to work!)

  7. Output custom wire load models (say Chip.cwl), loads (say
     Chip_setload.tcl and Chip_setresistance.tcl), and interconnect SDF
     (say Chip.sdf) to do budgeting to generate accurate synthesis budgets.

  8. Perform Budgeting in PrimeTime -- Create synthesis constraints.  Here
     is a sample script to do budgeting in PrimeTime:

       # Read netlist
       read_verilog Chip_est.v
       current_design Chip
       # Read SDF
       read_sdf ../parasitics/Chip.sdf
       # Apply constraints
       source Chip_setload.tcl
       source Chip_setresistance.tcl
       source top_const.tcl
       # Allocate budget for each of the top level blocks.
       allocate_budgets -level 0 -write_context -no_boundary -format dcsh 

  9. Run Design Compiler on the soft blocks using constraints generated from
     budgeting (step 8), and Customer wire load model generated from Chip
     Architect, i.e. Chip.cwl (step 7).  (My understanding is that for
     better results, you could use PhysOpt at this stage in place of DC.)

 10. Read the DC generated netlist back into Chip Architect.

 11. Perform final placement on all blocks (high effort), refine Floorplan,
     do top level route.  Run clock tree synthesis (currently under dev in
     Chip Architect).  We're looking at integrating Ultima's ClockWise tool
     here because they have a useful skew solution.  The Chip Architect
     people are doing a zero skew tool; ClockWise can skew your design to
     get additional set-up time.

 12. Final Analysis and Optimization (similar to step 5).  The only gotcha
     here is that I had to write my own TCL repeater insertion tool because
     I couldn't Chip Architect's IPO features to work.

 13. Perform final In-Place Optimization (IPO) & other fixes for violations
     (like step 6).  (Uh...  This was the official way it was supposed to
     work.  It didn't.  I'm just including this step to be complete.)

 14. Output final floorplan, final cell placement, final netlist to a std
     cell router -- in our case, this LSI's FlexStream Global, Detail, and
     Cleanup tools (version 1.0).  (This isn't new LSI software, it's just
     been renamed to FlexStream.  Before this, it was LSI PD, before that
     it was CMDE.  This is tried & true LSI software.)

 15. Bring back the routed blocks into Chip Architect, make sure overall
     chip timing is okay.  We never did this step, because we used LSI's
     delay predictor, "LSIdelay", to generate SDF's and then we used
     PrimeTime to verify the final timing.  Actually, we used Frequency
     Tech's Columbus to extract the parasitics and fed that into LSIdelay
     to make the SDF's.


Our major concerns with a new tool of this complexity and from a vendor whom
until recently didn't play in the P&R field were typical:

  - Was the code stable ?
  - Would it have the capacity to handle million gate plus designs ?
  - Would the placement quality of results measure up ?
  - Would the runtime measure up (multi-threaded ?) ?

Plus the additional concern of whether the timing Chip Architect predicted
would match up, within reasonable amount, with our vendor's (LSI Logic)
sign-off delay calculator.


Our Experience With 3 LSI Designs
---------------------------------

We did various amounts of testing/actual work w/ the tool on three designs,
"Larry", "Moe", and "Curly":

 1.) "Larry" - 100K gates, 3 SRAMs, 100MHz (used to test our proposed flow)

      Since Chip Architect does not perform detail routing, but stops at
      placement and global routing, we had a problem.  To use it on our
      current 0.25um and larger geometry designs we would have to interface
      to the LSI proprietary tools for clock insertion, routing, etc.  (LSI
      now uses Avanti for the 0.18um and below geometries).  Chip Architect
      was designed to interface to Cadence/Avanti, but had no hooks for LSI.
   
      Using Chip Architect's TCL API interface, we were able to accomplish
      the two way handoff without much difficulty.  We made a Perl scriptto
      map LSI's pad placement into TCL commands to re-create it in Chip
      Architect.  And a TCL script was used to write out the cell
      coordinates and orientations of all internal cells in LSI format.
   
      On this design, due to the small size, we decided not to use the
      hierarchical features and just place the it flat.  This design did
      not have difficult core timing, but had extremely tight I/O timing.
      As a way to meet both setup and hold output constraints this design
      had large delay cells instantiated in the output paths which we would
      later ECO downsize as necessary.
   
      Performing a timing driven placement was simple as we could pretty
      much use the PrimeTime constraints already prepared for timing
      analysis.  On this design placement took 1.2hrs. (4 processors),
      global routing 45 min., and timing calculation and analysis another
      30 min.  Our results were mostly good, as core timing was completely
      met with +1.3ns of slack, but the I/O timing was off (as expected)
      with -3.3ns of slack.
   
      We then attempted to use the automatic IPO features of Chip Architect
      to fix the timing problems -- with little luck.  The promised Chip
      Architect IPO features were cell upsizing and buffer insertion, but
      the recommended flow for "best" results was to export Chip Architect
      info to Synopsys Floorplan Manager (... more on this later).
   
      Anyway, since the necessary corrections were obvious, it was easy
      to use the (excellent) interactive Chip Architect TCL commands to
      downsize the cells, legalize the placement, and re-time the design
      (*incremental* in most cases, and very fast) ... and timing was met
      in Chip Architect.
   
      We then dumped the placement into LSI's tools, re-performed global
      routing and estimated parasitic extraction, and generated SDF's using
      LSI's delay calculator.  After re-timing the design with PrimeTime,
      the correlation was within ~5% on *most* paths.  (The only real
      exception to the outstanding correlation was output buffer timing.
      For some reason, which at this point we haven't really researched or
      explained, LSI's delay predictor shows ~900 ps slower paths than Chip
      Architect.  This anomaly was put on the back burner due to time
      pressures and the simple work around of compensating with additional
      output delay.)

 2.) "Moe" - 1M gates, 100 SRAMs @ 100MHz

      This was a design in progress.  We were having trouble getting timing
      closure.  After well over a month of timing iterations -- which forus
      consists of synthesis, test insertion, placement, scan reordering,
      MOTIVE analysis, repeater insertion and IPO upsizing, and re-analysis
      with motive -- we still had ~3K paths with as much as -2 ns of slack.

      Since the design was being done flat with just careful floorplanning,
      and it would have been way too much work to go back and re-implement
      the design hierarchically, it was a what the hell let's try it kind
      of thing to throw Chip Architect totally flat (no floorplan) at it.

      It took a little bit of work to port the ram placement, power routes,
      and placeblocks from the LSI database and into Chip Architect, but
      after that things ran smoothly.  Chip Architect completed full timing
      driven placement of ~300K instances in 6 hrs (8 processors) ... with
      results better than all of our careful floorplanning and timing driven
      placement in LSI's tools, including post placement repeater insertion
      and IPO's.  End result: 250 failing paths w/ -1ns of slack or better.

      The only bad things to say about these surprising results is that Chip
      Architect IPO attempts to try and fix the remaining failures would
      only repeatedly crash.  And we haven't had the chance to port the
      placement back into the LSI tools, re-calculate timing, and re-analyze
      the results to make sure the timing really is this good (although we
      did on "Curly" - see below).
   
 3.) "Curly" - 750K gates and 50 SRAMs @ 100/155MHz

      Here was a design just beginning in the planning and layout stages
      where we could really use the capabilities of Chip Architect.  It
      consisted of ~20 large sub-systems, making it ideal for hierarchical
      layout.  Furthermore (without going into too much detail), at this
      point we made an observation that got rid of one of the real pains
      normally associated with hierarchical layout -- developing the lower
      level timing constraints.

      Our synthesis methodology consists of bottom-up compile, PrimeTime
      budgeting, and re-compilation. With this in mind, we noticed that the
      block level budgeted scripts that Primetime outputs, of which we had
      until this point used only the dcsh format for re-compilation, are
      *almost* perfect for direct use (the ptsh format) as block level
      placement constraints in Chip Architect.  Only a little filtering was
      needed to remove some unnecessary stuff.

      So following this idea, we set up a top level floorplan, allocated
      outlines for the lower blocks, and then kicked off runs that just
      sourced the filtered ptsh scripts for constraints.  We used GNU Make
      and multiple Chip Architect licenses to run concurrently.  And
      runtimes for block level placement, global route, parasitic
      extraction, and static timing were incredibly fast: 4 to 32 minutes
      per block for blocks that ranged in size up to 90K gates.
   
      The output of all this was placed/global-routed timing reports which
      we compared side by side with the same Design Compiler reports.

      And the results at the block level were very good: 14 met timing, and
      6 with -3 ns or better of negative slack violations on the budgeted
      I/O constraints.  Then, using Chip Architect to global route the
      inter-block nets, we generated a top level timing report ... and had
      up to -11 ns of slack.

      Analysis of the failing paths immediately showed the problem, long
      inter-block wires.  So again we tried the Chip Architect IPO features,
      first at the block level, and then at the top.  And again we had no
      luck.  Block level runs produced weird and inconsistant results,
      sometimes ending up with worse than they started (?).  Top level
      attempts would only produce crashes.

      So, rolling up our sleeves (we were commited to Chip Architect now,
      we crafted a TCL script to (using the API) add repeaters along
      the long wires ... and a week later had code that would produce a
      design with only -0.75 ns of slack.  Furthermore, since we had
      overconstrained the synthesis and placement by 0.3ns, and had 0.8 ns
      of clock uncertainty for skew and PLL jitter figured in, we now had
      a placement which we felt was good enough to go into route with.

      In fact, some poking around showed that the failing paths probably
      could have been fixed pretty easily with some interactive upsizing...
      but this was a trial with RTL that wasn't frozen, so we moved on.

      With some trepidation we took the placement into the LSI tools and
      re-analyzed, and came up w/ -0.25 ns of slack (different path).  This
      we considered to be excellent correlation when we remembered that

         a) two different global routers
         b) two different parasitic extractors and
         c) two different delay calculators were used.


If you have read this far, you have gathered that we ran into some problems
with the Chip Architect.  The worst thing is that its IPO features seem to
be pretty much useless in their current incarnation.  The only good thingI
can say about them is that Synopsys seems to be aware of the situation and
has promised improvements in the next major release.

However, other than broken IPO, the few bugs we ran into have been minor and
had easy workarounds.  I've been very impressed w/ the code stability, only
getting it to roll over & die when I really, really pushed it.

What has been a little disappointing, if not expected, has not been bugs but
miscellaneous annoying behavior.  The worst example of this is an extremely
awkward logical vs. physical hierarchy separation.  To place a design in
Chip Architect there cannot be any hierarchy, so if there was a logical
hierarchy it must must be flattened.  Flattening our big 300K instances
design took ~10 hrs, which if added to the placement duration makes the
runtime go from excellent to very poor.  In addition, if you were to make
netlist changes to the design, there is no way to write out a logical
netlist, so you end up with a cumbersome flat netlist -- which I am still
testing to see if it breaks other tools.  Also, if you happen to need to
apply attributes/constraints/etc. to internal design nets/pins/cells, you
must maintain two sets: one for the hierarchical design & one for the flat.

Finally, the most disappointing thing to me about the tool to me is that it
appears Synopsys might not *let* it be as good as it *could* be.  They have
consciously only integrated about 70% of Primetime into Chip Architect,
leaving out budgeting and misc. other features.  It appears as if they even
intentionally hamstrung some commands.  In addition, one of their current
recommendations is write files out of Chip Architect, use Floorplan Manager
for optimization, and import back into Chip Architect.  The same goes for
budgeting and PrimeTime.  Why!?  I guess they need to protect their revenue
stream for Primetime and Floorplan Manager, but I for one (of course I am
not writing this check) would be happy to see Chip Architect take a PT orFM
license and not force me to have write & read back in 100's of MBs of files.

Anyway, I hope this info helps someone else who is out trying to make tool
decisions.  Despite my gripes, I really like the tool and plan on using it
on all new projects.  Although we haven't taped out anything with Chip
Architect yet, "Curly" is on the fast track to go and is definitely moving
faster than it would be without something like Chip Architect.

And I should mention, John, that whenever I have run into the inevitable
problems, the local (Boston) apps. and corporate engineering support for
Chip Architect has been excellent.

    - Jon Stahl, Principal Engineer
      Avici Systems                             N. Billerica, MA



 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)