( ESNUG 351 Item 1 ) ---------------------------------------------- [5/4/00]

From: Jay McDougal 
Subject: One Engineer's Comparison Of Cadence PKS 3.0.13 To "PKS-II" 3.0.15

Hi John,

I don't have the time to give you a detailed description of my use of PKS.
However, I will state some basic info from my use to date that may be
interesting/useful to the ESNUG readers.  First of all, I will refer
to several versions of PKS.  PKS-II is really a marketing name for versions
of PKS after 3.0.13.  PKS-II is not a new/seperate product with its own
seperate license file; it's just revs of PKS (after 3.0.13) with the latest
patches and features.  My latest work has been with 3.0.15.

I have run PKS myself with support from Cadence during my initial training.
At first, Cadence took one of our designs (a 200 Mhz, 170 Kgate, 0.25um
ColdFire-4 processor core) and ran it through PKS for us in July of '99.
Their results looked encouraging, so I decided to run PKS in house to
reproduce their results and try it on a few other ARM/ColdFire processor
cores.  Synopsys PhysOpt was still not available on HPUX in this timeframe
(Oct 99), and all the other physical synthesis tools were not even offered
by their vendors except in a "we will run it for you" (Taxicab) model.

I have been running PKS off & on since then on ARM & ColdFire cores.

None of the 4 ARM/ColdFire cores I have completed with PKS have been
released into a production ASIC.  We expect the first ASICs that use these
cores, produced using PKS, to release this fall.  Here are the basic things
I have seen in my evaluation/use of PKS.


The Good Things:

  1) I had little/no issue with correlation of the PKS timing engine
     and the timing engine used in Silicon Ensemble (SE).

  2) For designs that are lightly congested PKS is able to give me better
     timing in a single pass than we had achieved in 6-10 IPO/ECO iterations
     on the same design using Synopsys with SE.  Results compared to using a
     single pass SE with a QPopt/PBOpt flow were mixed with some savings in
     cell area and about the same timing results.  The end results after
     routing were within 2-3 percent of the PKS estimates.

  3) Getting our library physical and timing info into PKS was easy, if you
     have a SE and Synopsys background/history.  Just take a Synopsys .lib
     and compile them to make your PKS .alf file.

  4) In cases where I was really pushing the clock frequency, PKS run times
     were dramatically longer than standard RTL-with-wireload synthesis
     runs.  We use both Ambit RTL and DC synthesis here in Corvallis.  The
     200 Mhz, 170 Kgate, 0.25um ColdFire-4 processor core would take about
     14 hours using wireload models with Ambit RTL or Synopsys DC (didn't
     matter which we used.)  But since it's wireloads, I know it would
     really be 160 Mhz or less instead of the promised 200 Mhz, unless
     I did lots of timing driven place & route, PBopt, etc.  A PKS run
     on the same design at 192 Mhz takes 21 hours (1.5X standard synthesis.)
     To get up to my target 200 Mhz, PKS takes 42 hours (3X standard
     synthesis.)  However, most of this time is spent getting the last few
     tenths of a nanosecond that we were never able to get with our regular
     synthesis/place/route flow.  Also, this extra time is insignificant
     compared to the time required to run even a single IPO/ECO loop.


The Not So Good Things:

  1) PKS (3.0.13 and prior) did not accurately account for pre-placed cells,
     layer obstructions, and power rings.  It would place things on top of
     these pre-placed parts and obstructions.  There are ways to work around
     this, but it caused difficulty and innacuracy in the PKS congestion
     estimates.

     The latest releases of PKS (3.0.15) have resolved this.  Pre-routes,
     obstructions, etc. are handled well now; it sees them.  PKS does an
     initial place with its own placer.  These placements typically overlap
     each other in bins.  Run Qplace to clean the bins.  Typically I'll have
     to interate between PKS and Qplace twice.  After the second Qplace
     call, you then call WarpRoute, and you might then interate between
     PKS and WarpRoute a few times.  If you're not congested, you don't
     need to interate much because WarpRoute is very Steiner.

  2) With medium to highly congested designs, PKS routability estimates
     were not accurate.  In the worst cases, PKS was unable to optimize the
     design without growing our original floorplan even though we knew it
     was routable in SE.  In other cases, it lead to non-convergence because
     there was up to a 25 percent difference between PKS timing estimates
     and final WarpRoute back-annotated timing after routing.  This was
     mainly due to over-congested areas that had to be routed around.  PKS
     did not do a good job of predicting the routing/timing in these areas
     and the additional RC delay trashed timing.

     This is partially addressed with PKS versions 3.0.15 and beyond.  The
     routing estimates for congested designs are still off.  The global
     route portion of the Warp router has been integrated into PKS, but the
     final route part of WarpRoute is not integrated into PKS yet.  These
     enhancements allowed me to achieve timing convergence with most
     congested designs.  However, with very congested designs it required
     3-4 iterations between calling the integrated Warp global router and
     doing incremental timing/placement optimizations.

     My suspicion is that this type of convergence with may not always be
     possible and that convergence is not guaranteed.  The potential
     problem is that after each incremental timing/placement iteration, a
     new global route must be performed (not incremental).  This can create
     new (unanticipated) problem nets in congested areas.  In my case these
     nets converged to a stable set and the incr timing optimizations were
     able to fix all of them without any new global route suprises after
     3-4 iterations.  The WarpRoute global routing algorithms aren't the
     same as the PKS global routing algorithms so their estimates still
     often differ significantly.  The more congested the design, the worse
     this gets.

  3) PKS documentation isn't a strong point.  No useful design methodologies
     are documented, just the basic command stuff and vanilla flows.  I had
     to discover (with lots of help from my AE) most of the real world PKS
     techniques.

  4) Clock tree generation and scan insertion are not well integrated into
     the PKS flow and require a patch process where we leave PKS.  This
     means moving large data files back and forth and potentially handing
     off or synchronizing with another designer.  Once outside PKS, we run
     our in-house scan insertion tool and then create buffer trees for
     the clocks and scan control signals (using CTgen inside SE.)  Then,
     you have to reload the whole physical design back into PKS for final
     timing optimization.

In conclusion, PKS is working well for us for timing convergence.  We've
purchased two copies and will be using them for production work.

    - Jay McDougal
      Agilent                                    Corvallis, OR


 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)