Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS


( ESNUG 370 Item 6 ) -------------------------------------------- [05/17/01]

Subject: ( ESNUG 368 #12 )  PhysOpt, Power Compiler, & Avanti/Cadence CTS

> I would be very interested to get feedback on how good the hooks are
> between Power Compiler and Physical Compiler in the following two
> instances (particularly for designs well over 100 Mhz):
>
>  1. With integrated clock gating cells being available
>
>  2. Without integrated clock gating cells being available.  In this case,
>     Synopsys claims localized grouping within PhysOpt will ensure the
>     clock gating elements would stay together in layout.  Has anyone had
>     any experience with this in a *real* tapeout flow?  Has the CTS tools
>     you used with PhysOpt / Power Compiler handled skew management well?
>
> Additionally, how's the interaction if the clock gating cells have
> observability/controllability ports?
>
>     - Neel Das
>       Corrent Corporation


From: [ A Dallas PhysOpt AE ]

Hello John,

I'm a PhysOpt AC in Dallas.  I can't answer Neel's question about Power
Compiler, but I do know all about using PhysOpt with Cadence and Avanti CTS
tools.  Actually I've used CTS tools from Avanti and Cadence as well as
other customer's proprietary CTS tools.  My specific flow steps may need to
be tuned for each customer's environment, but it's a good starting point.

Loading PhysOpt Data Into Clock Tree Synthesis:
-----------------------------------------------
This step is similiar to a Design Compiler flow as you still have a Verilog
netlist and timing constraints.  Whatever flow you traditionally use to pass
or convert the Synopsys timing constraints and clock descriptions for your
CTS tool will still apply for PhysOpt.   The major addition to the flow is
that the netlist is already placed.   Your placement data can be passed to
your CTS tool through a DEF file or PDEF file.  

Running Clock Tree Synthesis:
-----------------------------
How you run your CTS tool really does not change in a PhysOpt flow.  You're
simply starting from a placed netlist out of Physical Compiler.  The CTS,
route, and extraction flows will be the same.  After these steps are
complete you will need to generate all of the normal backannotations data,
SDF, setload, etc.  After this, most customers will look at their timing
results to see if any post-CTS optimization is needed.  If it is, you can
do it in PhysOpt.  (Before you start optimizing the netlist in PhysOpt,
spend a little time looking at your timing results.  In some cases, you will
see that there are quite a few paths which need optimization and the need
for PhysOpt will be obvious.  You may find, however, that the design only
has a handful of paths which are missing timing.  If you look at these paths
and see that simply sizing up a cell will fix the problem, don't bother with
PhyOpt -- instead just do the resizing inside your Cadence/Avanti P&R tool.
It's stupid to go through PhysOpt if all you have to do is hand tweak a
few cells in your P&R environment.)


Running PhysOpt After Clock Tree Synthesis:
-------------------------------------------
Here are the steps needed for PhysOpt to optimize a post-CTS netlist.

 1. Create an updated netlist for PhysOpt.  The updated netlist needs to
    contain the actual clock elements (buffers) added during the CTS
    process.  Without this data, PhysOpt could chose to upsize a cell or
    add a buffer and not realize that a CTS buffer was being overlapped.  

    Some tools "collapse" the CTS tree before handing the netlist back to
    PhysOpt.  I have seen this most often with customers using an Avanti
    flow.  If this is the flow you are using, see the section entitled
    "Optimizing Designs with Collapsed Clock Trees" (below).

 2. Create an updated PDEF file for PhysOpt.  The updated PDEF file
    contains the locations of the original cells and the newly added
    clock elements.

 3. Use check_legality in PhysOpt to make sure the clock elements (buffers)
    added during CTS have legalized locations.  If the clock elements are
    not in legalized locations, resolve this problem before proceeding. 

 4. Fix the placement of all sequential cells (flip-flops and latches) using
    the set_dont_touch_placement command.  THIS IS A VERY CRITICAL STEP!
    Your clock tree was generated in the CTS tool based on the existing
    locations of the sequential cells in the design.  By fixing the
    placement of your sequential cells, you ensure they are not moved by
    Physical Compiler.  If you let PhysOpt move the sequential cells around,
    you could INVALIDATE your CTS results.

 5. Fix the placement of the clock elements to make sure that the entire
    clock tree (clock buffers and sequential cells) remains unchanged in
    PhysOpt.  Use the set_dont_touch_placement command and the
    set_dont_touch_network commands to insure the clock buffers are
    unchanged.

 6. Reapply your timing and DRC constraints.

 7. Change your clocks from ideal clocks to propagated clocks.  Use the
    set_propagated_clock command.

 8. Back-annotate the Post-CTS SDF delays and the set_load data.  (Sometimes
    this can be a major pain in the ass with data formats between tools.  At
    the end of this e-mail I have included a few hints and scripts to help
    eliminate some issues seen in the past.)

 9. Once your design is annotated and the proper dont_touch_placement
    assignmemts are in place, use "physopt -incremental -post_route" to
    optimize your timing. 

Once the design has been optimized, you will need to cart your design
database into your (Avanti/Cadence) backend tools for completion.  Each
ASIC vendor will have their own backend ECO flow.  Here's a sample script
for running PhysOpt after Cadence/Avanti CTS:

 psyn_shell-t> source lib_setup.tcl
 psyn_shell-t> read_verilog post_cts_design.v
 psyn_shell-t> current_design TOP
 psyn_shell-t> read_pdef post_cts_design.pdef
 psyn_shell-t> check_legality
 psyn_shell-t> set_dont_touch_placement [all_registers ]
 psyn_shell-t> set_dont_touch_placement [all_fanout -clock_tree -only_cells]
 psyn_shell-t> set_dont_touch_network [all_clocks ]
 psyn_shell-t> source design_constraints.tcl
 psyn_shell-t> set_propagated_clock [all_clocks ]
 psyn_shell-t> read_sdf post_route_extracted_delay.sdf
 psyn_shell-t> source post_route_extracted_load.tcl
 psyn_shell-t> report_timing
 psyn_shell-t> physopt -incremental -post_route


Optimizing Designs with Collapsed Clock Trees:
----------------------------------------------
Here is how the flow worked for one customer.  In the original netlist, our
customer would instantiate a CTS macro cell (i.e. "CTS00B") which acted
as a place holder for his CTS tree.  During CTS, this macro cell would be
"expanded" and individual gate levels buffers used to implement the clock
trees.  After CTS, all of the actual gate level clock tree buffers would
be "collapsed" into the CTS macro cell.  His actual clock tree delay was
represented by a unique SDF delay from the CTS00B cell to each of the clock
pins in the design.  The gate level clock buffers did, however, exists in
the PDEF file.  When PhysOpt read the PDEF file, it would see these clock
gates as "physical-only" cells.  PhysOpt would respect the placement and
size of these cells during any placement optimizations and insure there
were no overlaps.

Two additional commands are needed in the flow to handle this type of
clock tree synthesis.

  1) When the PDEF file is read in step 2, it is important to use the
     "-allow_physical_cells" option with the read_pdef command.

  2) When setting the various dont_touch_placement commands in step 5, be
     sure to dont_touch_placement on the CTS macro cells

         set_dont_touch_placement \
              [ get_cells "*" -hier -filter "@ref_name == CTS00B" ]


Getting Avanti/Cadence P&R Backannotation Data Into PhysOpt:
------------------------------------------------------------

 1) SDF Version

    Certain version of the Cadence tools produce SDF version 2.0.  The
    Synopsys SDF readers only accept SDF v1.2 or v2.1, so the file version
    has to be changed to allow the SDF file to be read in.  This can be
    easily done using a UNIX sed command

      unix> mv file temp
      unix> sed -e 1,5s/2.0/2.1/ temp > file

    For example: 

      unix> mv TOP.sdf1 temp
      unix> sed -e 1,5s/2.0/2.1/ temp > TOP.sdf1

 2) SDF backslashes

    The SDF that the Cadence tools produces may contain backslashes to
    'escape' forward slashes (which is the hierarchy delimiter) because the
    original DEF input to their flow contained full pathnames.  This would
    be due to the input DEF describing cells that have logical hierarchy,
    so their full pathnames would be present.  Within the Cadence
    environment, the full path names are treated as 'flat names', ie. the
    slash is treated like an ordinary character in the cell's name.  So,
    when your SDF is written out the cell names will have the form :

          top\/block1\/instanceA

    When this SDF is back annotated into the Synopsys environment, PhysOpt
    gets confused because it tries to search for an instance at the top
    level that has the name specified in the SDF.  What is required is to
    remove the backslashes (which 'escape' the hierarchy delimiter) so that
    they revert back to being hierarchy delimiters and the cell names will
    then match what is in the Synopsys DB.  This can be easily done using a
    UNIX sed command

      unix> mv file temp
      unix> sed -e s/"\\\/"/"\/"/g  temp > file

    For example : 

      unix> mv TOP.sdf1 temp
      unix> sed -e s/"\\\/"/"\/"/g  temp > TOP.sdf1

 3) setload data format

    Some tools are unable to create 'setload' data using the tcl format
    needed by PhysOpt.  These tools provide the data using the older Design
    Compiler format:

             set_load -su value find(net, net_name )

    For PhysOpt (as well as PrimeTime), the script has to be translated to
    TCL format via 'transcript' (found at.../syn/bin/transcript).  Here's an
    example to translate 'file1' into 'file2'

      unix> setenv SYNOPSYS_DIR /remote/dtg670/image/1999.05
      unix> $(SYNOPSYS_DIR)/sparcOS5/syn/bin/transcript -r \
            $(SYNOPSYS_DIR) file1 file2

    'file2' would be used to apply the 'set_load' commands for Physical
    Compiler shell.  This file has the format:

      set_load -subtract_pin_load  value [get_nets { net_name }]


I hope this fully answered Neel's questions concerning how to use PhysOpt
with either the Avanti or Cadence CTS tools, John.

    - [ A Dallas PhysOpt AE ]

         ----    ----    ----    ----    ----    ----   ----

From: Rajesh Pathak <rpathak@cadence.com>

Hi John,

Recently I came across an item in ESNUG that wanted to know experiences with
Power Compiler.  I have used Power Compiler extensively while working for
Texas Instruments (Houston).  Its been more than 2 years, the numbers are
hazy but the ballpark figures might be of interest to the readers.  Power is
an optimzation constraint and the tool order of priority is 1) Timing,
2) Power, 3) Area.  DC traditionally does 1) Timing, 2) Area, 3) Power.

There are two heuristics that Power Compiler uses to reduce power:

  1. Taking advantage of the positive slack on non-critical paths, Power
     Compiler trades path_delays for lower drive  cells.  Lower drive cells
     means lower power. 

  2. The other technique is based on the switching activity on each of the
     nets in the design which is captured during a typical simulation run
     and is an additional attribute on each of the nets in the design
     annotated by a standard file format (called SAIF I believe).  A net
     dissipates power as a product of switching activity (the average no.
     of  0-->1 or 1-->0 transitions per unit time) and the net capacitance.

     A low-switching activity net with a low capacitance and a high
     switching activity with higher capacitance are exchanged for each
     other resulting in a low switching but higher capacitance and a high
     switching but a lower capacitance net.  For example, cell foo is
     instantiated with port connections n1 having switching activity of
     0.2 and n2 with 0.4 with input "A" having a cap of 3pF and input B
     having a cap of 5pF, the following will occur --->

                    foo I1(.A(n1), .B(n2), .Z(n3))

     will become
                    foo I1(.A(n2), .B(n1), .Z(n3))

     The result is a reduction of cumulative power.

For most of the modules that I worked on, I found a reduction of 1% to 5%.
Of course this was a 40 Mhz chip with each of the modules consuming power in
the range of 10mW to 100mW.  One of the troubling things is that a switching
activity file is just a snap-shot of one application.  An other application
might result in a different switching activity file and Power Compiler can
act in a malacious fashion.  In the above example an application where
my switching activity of n1=0.4 and n2=0.1 will result in *increased* power
consumption if this application is run.  This happens more often than most
people would like to believe.  A savings of 1-2% could equally swing in the
other direction and in my experience it should not count as savings at
all.  I found that one can push the boundary by revisiting synthesis
methodology accepted for non-low power applications.  Some of the useful
techniques that really help are adding more cells to the library repository

   1. by removing dont_touch 
   2. custom cells
   3. artificially created "virtual cells" in .lib format.

Another technique is to tighten max rise/fall limit to decrease the short
circuit power consumption.

I am not sure if these observations can be extrapolated to high speed
designs.  I did this work on a very low frequency (40 Mhz) design.

    - Rajesh Pathak
      Cadence Design Systems                     Houston, TX

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)