Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS


( ESNUG 364 Item 14 ) -------------------------------------------- [02/01/01]

Subject: ( ESNUG 363 #11 )  Cadence PBOPT, Synopsys LBO, & FlexRoute Tricks

> As we get a good idea of the block sizes and shapes we floorplan the top
> level, with around 15 blocks & the I/O cells.   We ship the top level off
> to the third party & they insert clocks at the top level.  So far so good.
>
> Now they try to fix the timing on the long nets between blocks using
> QPOPT, and the results are horrendous.  At this point we would've already
> generated timing models for each of the blocks, and they are inputs to
> the QPOPT runs.  In many different attempts at top level timing
> optimization, QPOPT has not been able to put in an appropriate number of
> buffers/repeaters to achieve reasonable timing.  I did some experiments
> with long nets and various numbers of buffers and found that I should be
> able to go 5 mm in about 1.2 nS even with a less than optimum repeater
> scheme.  QPOPT isn't even getting close.  When we talked to Cadence R&D
> about this they basically said that QPOPT isn't intended to do this type
> of optimization.
>
> So my question is, how are other people doing the timing optimization
> (buffer and repeater insertion) at the top level for a hierarchical
> physical flow?
>
>    - Chris Simon
>      General Dynamics Information Systems       Minneapolis, MN


From: [ Intel Inside ]

John,

As usual, please keep me anonymous.

One challenge of any design flow is to build a methodology that works around
any weaknesses that your tools may have.  This is usually not enough.  You
need to do other things to help make your tools/methodologies have less work
to do.  For example, it's standard practice to either flop all signal inputs
or outputs at partition boundaries to get the best synthesis results and to
help top level timing issues.  I don't know if you did this for your design.
You can take this a step further and have all partition inputs and outputs
flopped.  Having no combinational logic between flop boundaries at partition
edges may have some implications on your design, but will surely give your
signals more time to traverse the top level real estate.  You can even go
further and duplicate your output flops to insure that all signal outputs
have a fanout of one input.  This really helps solve top level timing
issues.  Naturally, these have implications to your RTL, but it's food for
thought.  If you can make the tools have an easier problem to solve, less
hand effort will be required.

    - [ Intel Inside ]

         ----    ----    ----    ----    ----    ----   ----

From: "Lee Keep" <Lee_Keep@eur.3com.com>

Hi John,

I read Chris' article on ESNUG with interest as I have been working in the
area of hierarchical physical design timing closure for a few years now.
I have a few questions for my own clarification and some suggestions that
may help.  You may well have tried some of these already - but here goes
anyway....

 1) Timing budgets for inter-block paths

    You say that your block level timing is pretty much OK - but how did you
    allocate timing budgets for paths that cross between blocks during the
    synthesis phase?  Did the timing budget include any allowance for top
    level interconnects based on your 1.2 ns per 5 mm observation?

 2) Top level routing

    Did you attempt any form of top level routing prior to closing the
    timing of the sub-blocks?  Maybe a methodology where you complete
    the routing of your top level floorplan (black box sub-blocks), followed
    by a parasitic extraction and propagation of the values to your
    sub-blocks could help.  Even a top level global route may provide a
    better starting point for you sub-block constraints during synthesis.
    That way, passing down some of the effort at the top level into the
    sub-block which you know is easier to close.

 3) Sub-block I/O buffering

    I personally recommend a strategy where you insert buffers, connected to
    all signal I/O ports within each of your sub-blocks.  We use a dc_shell
    script to do this post synthesis.  These buffers should be given
    priority during sub-block placement to ensure they are placed in a cell
    row as close to the I/O port as possible.  We use Avanti P&R and their
    TDF constraint format that allows these weightings to be applied.  By
    choosing a sensible naming convention for these buffers you can also
    highlight them post-placement to ensure the've gone in the correct
    location.  It's been a while since I used SEDSM but I seem to remember
    something similar can be achieved.  Ensure you'dont touch' these buffers
    during any subsequent optimisation passes as QPOPT may well try to
    remove them at the sub-block level.

    This approach has helped us minimize the number of repeaters required in
    the top level layout - but some of our longest top level nets still
    needed some manual work.

 4) Repeater insertion / ECO placement

    You indicate that QPOPT can't insert enough buffers to do the job.  How
    densly placed is your logic?  I've seen placement utilisations so high
    that prevent the optimiser from inserting the number of buffers it
    wants.  However, I expect it's more to do with the tool running out of
    steam.  I also remember hearing a Cadence get-out clause that these
    tools could only provide incremental timing improvements of 5-10% -- and
    those were the days of PBOpt.  Seems this benchmark probably holds true
    today.

    How happy are you that your repeaters are being placed in a sensible
    location by QPOPT?  We use Synopsys LBO to fix our timing broken netlist
    (as opposed to a layout-engine based optimiser such as QPOPT or Saturn).
    We found that LBO was able to add a sufficent number of repeaters, but
    when it came to the ECO placement -- they were going in the wrong place
    -- sometimes causing the timing of a particular path to get even worse.
    The suggested location in our PDEF was not honoured by the ECO placer,
    requiring some manual placement work.

 5) Sub-block pin optimisation

    When you floorplan the top level, are you doing the pin optimsation of
    sub-block interfaces or is the third party?  Just wondering if sub-block
    port locations are as optimal as possible in you floorplan?  I guess
    with 15 blocks you are constrained in many directions when is comes to
    this so finding the optimal solution is tough.

I don't think there's a magic solution here that will save the day yet - the
best you can achieve with many of these tools is to get the amount of manual
repeater tweaks into the ten's rather than hundreds/thousands.

BTW, what clock speeds and process geometery are we talking here?

    - Lee Keep
      3Com                                       UK

         ----    ----    ----    ----    ----    ----   ----

From: [ A Synopsys FlexRoute AE ]

John,

I am a member of the Synopsys FlexRoute CAE team, and have been working on
top-level repeater/buffer insertion within FlexRoute for  the last 6 months
or so.  This capability just became available in our latest (Rev1.5)
FlexRoute release as of January 26, 2001.  We have done extensive in-house
testing, and are confident in our algorithms, but I must admit that no
customer has used it on a production design yet. 

FlexRoute is a gridless router, designed specifically as a top-level router
in a hierarchical system.  We knew all along that repeater insertion was
critical, and have been working on it for some time.

There are two basic modes, timing-driven and length-based, each of which
I will describe briefly.

Timing-Driven Repeater Insertion
--------------------------------

1. Requires a TBEF (Timing Based Exchange Format) constraint file that
   contains the following info for each top-level net:

   a. driver cell name, and hierarchical RC tree representation from inside
      the hierarchical block to the top-level pin of connection on that 
      block (usually on the edge of the block, but not required).
   b. receiver cell name(s), and also hierarchical RC info as described
      above, also includes a section to describe the arrival time budget
      and required input slew rate.

   Note: This is an ASCII format which can be easily generated with PERL
         scripts etc., future FlexRoute versions will derive this info
         directly from a design .db and/or PrimeTime STAMP/ILM models.

2. Requires a .db timing library database for the standard cells to be
   used for repeater insertion, and the driver and receiver cells.

3. The most useful option we have found from customer feedback is a rise 
   time (the same as a Max. Transition DRC check in DC/PC) optimization. 
   The user specifies a list of inverters and buffers which can be used, 
   and FlexRoute will insert the inverters or buffers as appropriate (will
   not change signal polarity of course).

4. The end result are legal, non-overlapping repeater locations, based on 
   defined DEF ROW/SITE locations.  No placement legalization step is
   required.

5. We have tested this on a variety of net types on large designs, including
   200 pin reset/scan_enable type nets, which get reasonable solutions of
   10-20 buffers, with all receiving pins meeting the rise time spec.

Length-Based Repeater Insertion
-------------------------------

1. All that is required is specification of a single inverter cell and 
   single buffer cell.

2. The design team must select these cells, and a specified "length" which
   will meet their rise time or other timing goals.

3. This is obviously very fast, but shows good promise for top-level
   repeater insertion.

4. The end result is the same as in Timing-Driven Repeater Insertion,
   legal, non-overlapping repeater locations.

One of the main advantages of both of these algorithms are that they are
based on FlexRoute coarse or detailed route net topologies, which fully take
into account all routing obstacles, as opposed to other techniques that may
use simplistic Steiner estimates for routes.  The end result are buffer
locations that take into account routability, with stable and predictable
results that lead to timing closure.

    - [ A Synopsys FlexRoute AE ]

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)