( ESNUG 344 Item 5 ) --------------------------------------------- [2/23/00]

Subject: ( ESNUG 335 #1 )  The PhysOpt-With-Cadence-Backend Matrox Tapeout

> We were able knock off about 3 to 4 weeks in our layout process by using
> the new PhysOpt flow.  In addition, our flow became more streamlined. 
> That is, we already had a flow in place using Design Compiler and
> Primetime to specify timing constraints and to run back annotation.  Our
> annotated db could then be directly fed to PhysOpt without having to
> translate everything back into the Avanti database for each iteration
> like we did with our old design flow.
>
>     - Bob Prevett, Design Engineer
>       NVIDIA                                       Santa Clara, CA


From: David Romanauskas <dromanau@matrox.com>

Hi, John,

I liked Bob Prevett's review in ESNUG 335 #1 of PhysOpt.  I see he used it
in a predominantly Avanti backend tool flow.  I'm about to leave Matrox and
join a new start-up, so I thought I'd send you a review of what it was like
to use PhysOpt in a mostly Cadence backend tool flow.

First off, I used PhysOpt from the very early alpha code days, so I got to
watch a lot of the specific commands I know change dramatically over time.
Our goal with PhysOpt was to save design cycle time without sacrificing
performance (area/timing).

Our designs were being fabbed in a 0.18u process using a COT flow for a
multi-million gate chip.  The chip involved multiple clocks with speeds up
to 360 Mhz and datapaths up to 256-bits wide.

Our Old DC Design Flow
======================

Here's the generic design flow we had prior to PhysOpt:

   Synthesis (Design Compiler)
          |
   Floorplanning (Cadence LDP)
          |
   Detailed Placement (Cadence Qplace)
          |
   Routing (Cadence WarpRoute)
          |
   RC extraction (Simplex QX)
          |
   Static Timing Analysis (PrimeTime)
          |
   Back-Annotate timing into DC
          |
   Return to synthesis (using custom wireload models)

We would usually iterate 5 to 15 times through this flow before we got the
timing closure in our specs.  Our final synthesis pass would include:

      - Clock tree insertion (Cadence CTGen)
      - Scan Insertion (Sunrise)
      - Routing (Warproute)
      - RC extraction (Simplex QX)
      - timing analysis (PrimeTime)

Once we had a fully routed netlist with everything included we would begin
an iterative ECO process to insert repeaters (buffers) for the paths that
went between the blocks.  This was to take care of long, heavily loaded
nets that were not necessarily having timing problems.  We had adopted a
method to budget interconnect times between blocks that helped avoid
inter-block timing problems, so the repeaters were mainly added to assure
signal that the signal had good transition.


Our New PhysOpt Design Flow
===========================

Our new design flow with PhysOpt was almost the same except with less work
involved.  PhysOpt just dropped into our existing flow replacing the
detailed cell placement.  Little else changed, except for the dramatic drop
in the number of iterations required to converge on timing.

Floorplanning was significantly easier in the PhysOpt flow, too.  Normally
we would floorplan regions down to very small modules.  This time we only
required a top level floorplan and the placement of the hard macros such
as RAM blocks.

Synthesis at this stage became a standard 2-step DC compile.  Examples:

      physopt -effort medium -congestion
      physopt -effort high -incremental

Routing and timing was extracted using Warproute and Simplex QX.  For our
final synthesis we added clock trees and scan chains.  At this point we
still performed an IPO cycle to fix the paths between modules and the new
scan chains for hold fixing) , but there was no need to buffer the internal
module paths since PhysOpt had taken care of this.

For our final synthesis we added clock trees and scan chains at the module
level and then performed an incremental PhysOpt run to fix any small timing
problems that may have appeared.

At this point the chip is assembled and we performed repeater insertion
between blocks as done in our old flow.


What We Liked
=============

The two flows may appear similar but there were some important differences.
The new PhysOpt flow only required ONE pass through P&R to achieve timing.
In our old flow, we would iterate 5 to 15 times to get timing closure and
it involved moving data though 8 different tools.  With the new PhysOpt
flow we completely avoided this type of running back and forth between
back-end and front-end trying to converge.  This easily saved us 4 weeks.

Design data was easier to exchange.  I can't stress that too much.  We
reduced the number of times we had to pass different files and formats
between tools, since it only occurs when going to clock tree insertion
and scan reordering now.  We liked staying in a DC based (TCL) environment
with just a few extra commands.  It made PhysOpt easy to learn and easy to
use for us.  Because of this, I estimate it would take a designer familiar
with back-end tools no more than an afternoon to get up and running with
PhysOpt.

The other thing we also liked was the fact that we no longer had to spend
a lot of time generating custom wire-load models.  We used to generate a
unique model for each and every module within the design, but now only took
the time to generate one for the whole chip.  We found that the initial
synthesis results with this wire load model provided a good enough starting
point for PhysOpt to complete the job and close timing. Since PhysOpt knows
the exact placement of each instance, it uses a quick global route to
estimate the wire delay.


Gotchas
=======

As expected for early alpha code, there were some glitches.  Synopsys
addressed all of those issues to our satisfaction.

We would have preferred to do clock-tree synthesis and scan-chain
re-ordering within PhysOpt.  Synopsys agreed these were necessary
features.

Doing synthesis and placement takes a lot of CPU time.  Most of our blocks
ran in 2-24 hours, but we had one large nasty block which took over 80
hours to complete because we didn't take a true hierarchical approach with
it.  (This block was over 1/4 of the entire chip in our multi-million gate
design.)

PhysOpt is exceptional, and really helps close in on timing quickly when all
the block information in context of the chip is represented.  However, be
VERY careful you don't throw out the results you get hierarchically.

To understand this clearly, John, you need to understand the approach we
took.  From a placement view, we developed all our blocks hierarchically
in the top level of the chip.  We then converged each block with local
routing within the block itself, then later stripped off all the routing
and then rerouted the entire chip flat.  Ouch.  What we saw with this flat
routing approach were that the global routes over modules sometimes
interfering with the local routes of those modules, throwing the estimates
that PhysOpt made on those nets off (sometimes by more than 1mm!).  Ouch.

This, of course, created some very painful timing closure difficulties for
us since for some nets the loads were greater than expected.

Our lesson learned was that a fully clean hierarchical methodology is the
only way to go if you want to use PhysOpt successfully.  This involves
providing visibilty of the top level routes to PhysOpt during its modular
run by pushing any nets that pass over the module into it to act as routing
obstruction.  When the module is routed, Warproute will respect these
obstructions, and the chip can then later be assembled keeping all routing
intact and avoiding the necessity to flatten and reroute from scratch.

The upside, once we learned this, was that we met our performance target
after the first pass.  That big 1/4 chip block only needed some minor
timing adjustments after applying the flat routing.  This final result
emphasized to us, John, that true hierarchy is the only way to go.

The other important caveat with PhyOpt is that one must be very careful on
how you assemble the chip.  After using PhysOpt to place each of the main
modules we let Warproute try to route the entire design.  This created many
headaches with the paths between blocks!!!  Warproute is not a Top Level
router, nor has it claimed to be one, so don't misuse it this way with
PhysOpt output or you'll regret it.

Next time, my plan would be to use PhysOpt in a completely hierarchical
fashion.  First synthesize and place, and then route all of my modules.
I'll then assemble it into a full chip using a real top level router.

When we began the project in early 1999 the RTL2placed-gates feature in
PhysOpt wasn't ready.  The group here at Matrox hopes to use this feature
in the future on upcoming projects.

    - David Romanauskas, Design Engineer
      Matrox                                   Montreal, Canada



 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)