( ESNUG 345 Item 1 ) ---------------------------------------------- [3/1/00]

Subject: ( ESNUG 344 #5 )  My 6 Carefree Days With A Stolen PhysOpt License

> We liked staying in a DC based (TCL) environment with just a few extra
> commands.  It made PhysOpt easy to learn and easy to use for us.  Because
> of this, I estimate it would take a designer familiar with back-end tools
> no more than an afternoon to get up and running with PhysOpt.
>
>     - David Romanauskas, Design Engineer
>       Matrox                                   Montreal, Canada


From: Roger Bethard <criguy@sgi.com>

Hi, John,

It took me took 6 days from initial ramp-up to final netlist with a stolen
copy PhysOpt.  But, my days are 18 hours, so I'd jokingly say that for the
mere mortals out there, it would take you around 2-3 weeks to do this.  :)

I am a logic designer at SGI for the Cray SV1 product line.  It's my job to
take legacy Verilog, merge and enhance the design, synthesize into a
technology that was never meant to run so fast, cram the results into a
ridiculously small floorplan image, and achieve post-physical design timing
closure.  In my copious spare time I fend off the constant pestering from
mechanical engineering for their worrisome power dissipation estimates.

Just recently I released a netlist to the IBM foundry to begin detailed
place and route.  I did not know about PhysOpt during this release process.
This was an 8M gate design, approximately 500K placeable objects, in IBM's
0.18u copper-interconnect technology.  Approximately half the cell area is
consumed by large register-array and SRAM macros, and approximately 70% of
the die is utilized.   Area-array I/O placement restrictions force the
pre-placement of I/O cells without regard to floorplan partition.  The
netlist is created using our traditional design flow:

  1) Write and simulate (cycle-based) multi-level hierarchical Verilog.  We
     use GENSIM, an internal Verilog cycle simulator that runs on Crays.

  2) Synthesize into gates using DC.  Ungroup and flatten into blocks of
     approximately 50K cells; ungroup without regard to final floorplan.
     Use ludicrously small wireload models, unreasonably high max_fanout
     constraints, and promiscuous input/output delays.  But choose target
     library cells wisely; do not rely on the vendor defaults.  Use ideal
     nets for any signals which will be distributed and repowered at the
     global level.  Makefiles and lots of DC licenses facilitate the rapid
     parallelized creation of the gate-level netlist.

  3) Partition the gate-level netlist.  The IBM physical guys prefer a flat
     netlist because of data management issues, but logic design like me
     prefer not to risk having 50K cell sub-designs spammed across the face
     of the die.  We compromise at a two-level hierarchy made of one- to
     two-dozen floorplan blocks.  My preferred floorplan tool is dry marker
     in a room filled with white boards.  Blue marker works best.  Don't
     write Verilog for this; just shuffle the 50K cell blocks and draw a
     rectangle around the ones that result in point-to-point (reg-to-reg
     preferred) adjacent inter-block communication.  There is _no_ greater
     insurance for post-physical design timing closure than planning
     up-front for inter-block communication.

  4) Incrementally synthesize the gate-level netlist using DC.  Use group()
     to gather multiple 50K cell designs into the one- to two-dozen
     floorplan partitions, then flatten the partitions.  Constrain the
     partition input/output ports to one pin max.  Maintain the original
     ludicrous wireload models for internal paths, but nail the
     between-block paths with pessimistic (even vulgar) CAP and RC
     constraints.  At the top level, use characterize() to propagate
     top-level assertions down to floorplan block ports, then perform a
     parallelized incremental compile on the floorplan blocks.  Iterate on
     characterize() and compile() 2-3 times.

  5) Front-end process the floorplan netlist using the vendor's toolset.
     Insert ideal clock trees, insert scan chains, etc.

  6) Release the front-end-processed netlist to the IBM physical guys.
     They Place-and-Route using the IBM proprietary toolset.

  7) Spend the next three months converging on timing closure.  Static
     timing analysis allows rapid feedback of placement-based
     optimizations.  Hierarchical netlist enables parallel work.  I
     choose personally not to use PrimeTime because I sign off on timing
     closure with IBM's EinsTimer static timing analysis tool anyway.

  8) Final 3-D extraction of CAP and RC data for timing signoff.  Do more
     EinsTimer work.

  9) Formal verification of gate-level netlist to pre-synthesis netlist
     using Verplex.  Release ASIC to manufacturing.

In the absence of a timing-driven placement tool, we iterate using
capacitance constraints to drive incremental placement.  When necessary we
clone logic and restructure fanout trees by hand, or bitstack portions of
logic to seed an incremental placement.  The IBM toolset supports in-place
optimization of drive strengths, but the placement has to be reasonable to
begin with.  We extract CAP and RC into the stand-alone static timing
analysis tool, hand-create the ECOs to fix problems that placement alone
cannot remedy, and iterate.  At no time do we back-annotate and throw the
design back over the wall into synthesis.  That feed-back/feed-forward
mechanism tends to create more work than actual benefit.

After I "Stole" PhysOpt
-----------------------

I found out through the grapevine that one of our other projects here at
Cray had acquired a copy of PhysOpt.  I'm not smart enough to work with
"smart" GUI's, and I don't have time to learn complicated tools or
resynthesize for the sake of re-synthesis, but I volunteered to use my
design for a quick performance evaluation.  (John, this is a polite way of
saying that I was desperate so I stole their PhysOpt license after hours.
They were the ones who paid the $500,000 for the license.  I don't think
they were exactly happy about this, but I had a chip due ASAP.   They
understood, so they tolerated me for a short while.)

For gate netlist optimization, this is the flow I followed:

  * Create the technology PDEF file.  This file is used by PhysOpt to
    describe cell sizes, cell pin locations, etc.  I can't comment on
    how long it actually took to create this file because it was the
    one task I could actually hoist off the members of the other project.

  * Create the floorplan LEF file.  This file describes the floorplan
    cell rows, routing resources, pin locations, obstructions, etc.
    I found it was less work to create a generic parameterized LEF which
    defines cell rows and floorplan area, then use the PhysOpt Tcl
    commands to place the floorplan-specific ports and blockages.
    It took an hour or two to create the generic file and then create
    the floorplan-specific files.

  * Create the floorplan Tcl script.  This script reads the base technology
    PDEF, the original *.db design, and the new floorplan LEF.  It also
    places the ports and places any obstructions caused at the top level.
    All ports have to be assigned an initial location, but the locations
    don't necessarily have to be legal.  The Tcl commands are a nice way
    to bitstack the bussed ports, pre-place the large register-array and
    SRAM macros, and create blockages.  It took about six hours per
    floorplan block to create the files.  (There were now a dozen floorplan
    blocks altogether; and I pipelined the process.)

  * Run PhysOpt.  It took about six hours per floorplan block to optimize.
    I ran several iterations on the first block (due to my operator error
    during scripting), but the remaining blocks ran through with one
    iteration only.

  * Dump the design into a format that the IBM P&R tools understand.  I
    dump an output LEF that contains the placement information for cells
    and pins.  Run a Perl script to extract the cell locations from the
    LEF and convert to Tcl commands for the IBM P&R tools.  The PhysOpt
    obstruction locations and port locations are thrown away.  The real
    obstructions are created and propagated by the IBM P&R tool, and the
    real floorplan port locations are snapped to the cells that drive them.
    Remember the importance of point-to-point inter-block communication...

My initial and overall impression of PhysOpt is... great start!  The tool is
very easy to use once the basic setup is achieved.  Not once did I have to
fire up a GUI, although I hear there is a separate viewing tool available.
(I loathe GUIs.  Give me a Command Line Interface any day!)  I was expecting
a simple placement of the existing design, but I can see that a great deal
of area is conserved during logic restructuring and area recovery.

The tool does a reasonable job at timing-driven placement, but it does an
even better job if the multi-cellrow-high cells are hand-placed first.  All
but one floorplan block completed within 5% of the target design frequency.
That one floorplan block fails spectacularly on several paths because
PhysOpt doesn't see the cellrow space between the non-rectangular SRAM
macros.  I'm just now getting a peek at wiring congestion within the IBM
tools.  Although I did not run a separate PhysOpt run to specifically
address congestion, nothing I see today looks unreasonable or any worse than
our previously released ASICs.  The 5% timing goal achievement is close
enough from my standpoint.  The assumptions made during RC extraction are
gross, and the assumptions made about global wiring obstructions are
incomplete.  The IBM tools are fully capable of taking the PhysOpt output
and completing the P&R task.

I can't quantify the final schedule benefit from PhysOpt because I'm just
now starting the physical design cycle (nor would I want to, lest management
find out and retask my performance plan). 

I can say from experience that I'm now at least one month, maybe two, ahead
of where I would have been without PhysOpt and it only cost me a week's
worth of work.  This result has definitely generated a stir around here at
Cray/SGI.

    - Roger Bethard, Design Engineer
      Cray/SGI                                  Chippewa Falls, WI



 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)