Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS

  Editor's Note: Well, that new "EDA Front To Back" conference had it's
  debut two weeks ago...  I heard that it had excellent technical content,
  but sparce user attendance.  (I couldn't go to it myself, so I've had to
  rely on what I've heard from others through the grape vine.)  If you
  went, I'd *love* to hear your first hand report of what you saw and
  thought of at the first "EDA Front To Back" conference.

                                            - John Cooley
                                              the ESNUG guy

( ESNUG 383 Subjects ) ------------------------------------------ [11/28/01]

 Item  1: ( ESNUG 376 #3 ) Four 80 Mhz 0.25/0.35 Cadence PKS/SE-PKS Tapeouts
 Item  2: Verilog Doesn't Like The Case Of My PhysOpt 'write_script' Output
 Item  3: ( ESNUG 382 #1 ) SPC Claims No Immediate Cadence Vesting Agreement
 Item  4: Do You Know Which Code Coverage Tools Support Latch-Based Designs?
 Item  5: ( ESNUG 382 #5 ) Use Formality 2001.08-FM1-SP2 And -netlist Reads
 Item  6: ( ESNUG 382 #8 ) Negative Timing Checks Is A *Cadence* Problem
 Item  7: Anyone To Do Some Quickie Cadence Silicon Ensemble Consulting?
 Item  8: Are You An Avanti Customer Who Has Used Astro In A Chip Tapeout?
 Item  9: ( ESNUG 380 #11 ) Watch Out! That VCS PLI *Will* Drag You Down!
 Item 10: Former IBMer Sets Up A Users 'TCL for EDA' Scripting Web Project
 Item 11: ( ESNUG 381 #7 ) CVS, Perforce, Synchronicity, RCS, ClearCase
 Item 12: How Can I Coax DC To Intelligently Duplicate High Fanout Nets?
 Item 13: What's The Dirt On The Synopsys DW_Debugger?  Useful?  Useless?
 Item 14: How To Treat Power Nets (PNETs) As Routing Obstructions In PhysOpt
 Item 15: ( ESNUG 381 #11 ) Avanti/Chrysalis Asks For Current User Benchmark
 Item 16: ( ESNUG 380 #12 ) VCS Scales With Mhz While NC-Verilog Doesn't
 Item 17: ( ESNUG 382 #2 ) Rarely Noticed Library Characterization Gotchas

 The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com

( ESNUG 383 Item 1 ) -------------------------------------------- [11/28/01]

Subject: ( ESNUG 376 #3 ) Four 80 Mhz 0.25/0.35 Cadence PKS/SE-PKS Tapeouts

> My summary is PKS works.  It has good correlation with silicon and can
> swallow large designs.  Interfacing with Silicon Ensemble is a no-brainer.
> It's bleeding-edge but showing some signs of maturity: we taped out with
> version SPR4.07 but couldn't have done it using SPR4.06.  Proof: the
> silicon is in manufacturing ramp up.
>
>     - Geoff Smith
>       Cisco Systems                              Toowong, Australia

From: "Ching Hsiang Yang" <yjs@sunplus.com.tw>

Hi, John,

It has been fun reading ESNUG through these years.  ESNUG has been very
helpful in getting the REAL story out of each tool.  Here's our Cadence
Ambit-RTL/PKS/SE 4 chip tapeout story.  I am hoping we can push vendors to
provide us better solutions.  Everything was done with the Cadence tool set
except for some sub-modules on 2 chips infected by Synopsys DC.

Background :

We've been using Ambit BuidGates since *before* their merger with Cadence.
And we are one of the rare companies to own both Avanti Apollo and Silicon
Ensemble for P&R.  Before using PKS, we had been using Ambit BuildGates
for logic synthesis for 3 years, so migrating to PKS was quite easy for us.
We have Synopsys DC, too.  It's also easy for us to translate DC scripts
into the Ambit environment either manually or by Ambit's translator.  (The
translator translates DC's "write_script" results into Ambit code.  Not
everything gets translated, but it's pretty close.)

Phase I :

Before committing to PKS, we tried to use it to re-design our previous
tapeouts as a test case.  These designs were done with a conventional
IPO-ECO flow either in SE or in Apollo.  (The four combinations were:
Ambit+Apollo, Ambit+SE, DC+SE, and DC+Apollo.)  It was definitely NOT push
button work.  When we started to evaluate PKS, ourlocal (Taiwan) Cadence AE
was also new to PKS.  We worked together to bring it up after almost 6
months.  It was painful to get it work at the beginning because we were
running PKS without Cadence HQ support.  We didn't get much attention then
simply because we are not as big as other Cadence customers.  We were so
small, Cadence even voided from their PCR system a PKS bug we had reported!

Then we figured if we're not big, we'll try to be first.  For such new flow,
you can not succeed until you get supported by the Cadence R&D core team.

All but one of the re-run chips achieved one-pass timing closure with PKS.
The timing reported from PKS is NOT consistent with our golden flow: Avanti
StarRCXT+Celestry MDC, but it's reasonably close (within 0.5 nsec.)  We did
not bother to get them closer because they used different timing engines.

Phase II :

After suffering through Phase I, we decided to tapeout our first PKS chip
in April.  After that, we had the confidence to use it as the standard flow
on our next 3 chips over the following 6 months.  Described below is our
1st PKS tapeout case (0.25 um).  The flow of our other 3 chips (1 at 0.25 um
and 2 at 0.35 um) was exactly the same except that our 1st run was with
PKS 3.0 -- we are using PKS 4.0 now.

  Chip profile : over 40 hard macros (Analog, RAMs, ROMs, PLLs)
  Components : around 200 K instances
  Internal Clock Rate: several clocks; main clock was 81 MHz
  Process: major fab in Taiwan, 0.25/0.35 um Artisan std cell, 1P5M process

  Simulation : Cadence NC-Verilog 3.2
  DFT : Syntest TurboScan
  Synthesis : some sub-blocks were designed with Synopsys DC but most are
      generated with Ambit BuildGates.
  Final STA : Ambit BuildGates + PrimeTime  (Each has their own good and
      bad sides)
  Formal verification : Verplex LEC
  FloorPlan and Power Plan : Cadence SE Ultra
  Chip Implementation (Physical) : Ambit PKS + SE Ultra (Cadence SP&R)
  RC Extraction : Avanti Star-RCXT
  Delay Calculation : Celestry MDC
  Reoptimization after routing : Ambit PKS  (with many iterations!)
  Physical Verification : Mentor Calibre + Cadence Dracula for DRC/ERC/LVS
  Layout polygon editing after P&R : Novas Laker

In running this PKS flow we encountered these problems:

  (1) Different timing engines produces different timing data.  While we saw
      positive slack in SE-PKS, we found it become negative slack after
      extraction and delay calculation.  This is basically caused by
      different extraction technology files (HyperExtract/Ambit in SE-PKS
      and Star-RCXT+MDC).  The difference is as large as 0.5 nsec sometimes.
      We overcame this by over-constraining PKS.  This works well in the
      whole flow.  How much you should over-constraint the design can be
      judged from 2 or 3 iterations.  And this value can be re-used in other
      projects with the same technology.  We have seen good correlation
      between several projects since then.

  (2) Within the Cadence system (PKS, after routing), the timing was very
      close.  Basically the timing reported by PKS is trustable if
      you don't have highly congested areas in your placement.  You must be
      sure to solve these congested areas if you use PKS.  They can produce
      unmatched timing data after routing requiring massive search & repair
      activities.

  (3) Bad logic from Ambit BuildGates.  Yes, Ambit-RTL produced bad logic in
      our case.  We caught the bugs in Verplex.  The problem did not stop us
      from using Ambit because DC is too slow and area consuming.  It's a
      trade-off.  We can handle the bug instead of delaying the chip and
      getting larger area.

  (4) Power Plan in SE-PKS is lousy compared to Avanti Apollo.  Cadence has
      said they will improve it, but they don't know when.

  (5) Floor Planning can not be done in PKS.  New version of PKS still can
      not do I/O plan.  We have to use SE to create our Floor Plan and then
      output DEF to PKS.

  (6) CTGen clockTree synthesis is too slow.  In our case, we had to wait
      6-7 hours to complete single run in Sun Ultra60.  (Using a SunBlade
      1000 750 MHz can cut the time to 3-4 hours.)  CT-PKS in PKS4.0 is
      even slower!!  Again, we are waiting Cadence to improve it but we have
      started to evaluate Celestry ClockWise now.

  (7) We had to manually do in-place size-up for some cells in PKS.  For an
      unknown reason, PKS just refused to size up cells to get better timing.
      We have to do that by hand.

Overall, we used SE-PKS to tapeout 4 chips since April/2001.  One chip is in
sample delivery stage and two others are silicon verified.  The 4th is just
wafer'ed out and under system board test.  I would say PKS is pretty stable
now.  We usually spend our time trying to better constrain designs for PKS.
One key benefit from using Ambit-RTL is the runtime and STA capability over
Synopsys DC.  We don't need to switch between DC <=> PrimeTime to do
optimization and STA.  Within <pks-shell>, we can start RTL synthesis down
to routed DEF out and ready for Avanti Star-RCXT extraction and delay
calculation (Celestry MDC).

I think the main issue we need Cadence to improve is runtimes.  CTGen and
CT-PKS are slow.  PKS is kind of slow, too.  To run PKS, we were forced to
buy more expensive machines like SUN Blade 1000's and HP C3700's.  (You may
be interested to know HP machine is much faster than SUN.)  Linux Ambit/PKS
or even the whole SP&R tool set in Linux is another thing we keep on request.
We have tested Synopsys DC.  DC runs even 1.3X faster with Linux PC's (AMD
1.2G, ASUS MB, 1.5G MEM) compared with SUN Blade 1000 (750MHz) so I think it
is expectable Ambit or PKS can run faster in Linux.

Power planning capability is weak in SE-PKS.  Cadence should improve that.

    - Ching Hsiang Yang
      Sunplus Technology                         HsinChu, Taiwan

( ESNUG 383 Item 2 ) -------------------------------------------- [11/28/01]

From: "Ofer Paperni" <ofer.paperni@motorola.com>
Subject: Verilog Doesn't Like The Case Of My PhysOpt 'write_script' Output

Hi John,

My design has signal names with UPPER and lower case.  When I'm doing
'write_script' in PhysOpt, it writes all of the signals in lower case.
Verilog doesn't like this.  How can I fix PhysOpt so the names in the
write_script will be liked in Verilog?

    - Ofer Paperni
      Motorola

( ESNUG 383 Item 3 ) -------------------------------------------- [11/28/01]

Subject: ( ESNUG 382 #1 ) SPC Claims No Immediate Cadence Vesting Agreement

> On the vesting schedule - It all depends on how good a negotiator the SPC
> team was.  I rank them among the best, especially with hindview of what
> Prakash was able to do with Ambit.  I've heard that much of the SPC
> vesting is immediate with a small kicker for staying similar of Ambit,
> at least for the executroids. 
>
>     - [ Msg 10166 on Yahoo CDN ]

From: Keith Mueller <keith@siperspective.com>

Hi John,

We will become "Silicon Perspective, a Cadence Company," and an independent 
operating unit of Cadence located in our existing facilities.  All of the 
management and R&D team will remain intact  --  there is no "instant vesting"
agreement as some of your readers have speculated.

    - Keith Mueller, VP, Worldwide Sales
      Silicon Perspective Corp.                  Santa Clara, CA

( ESNUG 383 Item 4 ) -------------------------------------------- [11/28/01]

From: Norbert Fried <Norbert.Fried@motorola.com>
Subject: Do You Know Which Code Coverage Tools Support Latch-Based Designs?

Hi John,

Do you know which code coverage tools support latch-based designs (Verilog
and/or VHDL)?  Do they support single stage latch and/or two stage latches
in FSM designs?

    - Norbert Fried
      Motorola

( ESNUG 383 Item 5 ) -------------------------------------------- [11/28/01]

Subject: ( ESNUG 382 #5 ) Use Formality 2001.08-FM1-SP2 And -netlist Reads

> These changes were in the 2001.06 Release Notes, but 5 months later I
> still see these "legacy" variables in many user's scripts.  I'm now
> asking Formality users to PLEASE scrub their scripts!  Here is a list
> of the common culprits...
>
>     - Steve Lamb
>       Synopsys, Inc.                             Marlboro, MA

From: Chris Ellingham <cje@synopsys.com>

Hi John,

As the Sr. CAE for Formality, two things in Steve's post bothered me:

  1.) Steve mentioned only Formality 2001.06.  Your readers must make
      sure they're using the most up to date version available
      (currently 2001.08-FM1-SP2).  It makes a BIG difference.

  2.) To achieve the best performance when reading in Verilog netlists,
      be sure to specify the "-netlist" option to the read_verilog
      command.  If your Verilog netlist read or link performance is
      unusually slow, this is most likely the culprit.  That switch tells
      Formality to call a netlist reader that is streamlined for structural
      Verilog and should be used anytime you have a fully mapped netlist. 

      To put some numbers behind "-netlist", I benchmarked two designs.  One
      contained 173 K verifiable gates and the other 1,500 K verifiable
      gates.  The smaller design read and linked 3X faster when using
      "-netlist" switch and the larger design was 20X faster.  Both designs
      saw the memory footprint decrease by 4X.

I hope this helps.

    - Chris Ellingham
      Synopsys, Inc.                             Mountain View, CA

( ESNUG 383 Item 6 ) -------------------------------------------- [11/28/01]

Subject: ( ESNUG 382 #8 ) Negative Timing Checks Is A *Cadence* Problem

> This Verilog-XL "bug" shows up when back-annotating an SDF file with
> negative hold times and using the +neg_tchk option.  Verilog should use
> the negative hold times correctly, but instead will set the negative
> values to zero without issuing an error or warning.
>
>     - Stefan Griebel
>       Cirrus Logic

From: "Rakesh Kinger" <rkinger@broadcom.com>

Hi, John,

This is a Cadence issue, not a Verilog issue.  Synopsys VCS handles each edge
independently, so negative limits won't be rounded to 0 and each edge has its
own independent violation window.  This is how VCS preserves the sanctity of
its negative timing checks and simulates according to user intention.

This capability is enabled via "+overlap" VCS compile-time flag.  You also
need to use "+neg_tchk" and "+multisource_int_delays" assuming you're using
VCS 6.1 Beta2 like I am.

Let me explain this in detail.

The example Stefan Griebel gave has two timing checks on the same data signal
w.r.t. posedge clock and the violation regions of the competing timing checks
are non-overlapping.

        e.g.   $setuphold(posedge CK, posedge D, 4, -3);
               $setuphold(posedge CK, negedge D, 2, -1);

Stefan says NC-Verilog and Verilog-XL are unable to handle these timing checks
and they erroneously reduce them to zero.

        e.g.   $setuphold(posedge CK, posedge D, 4, 0);
               $setuphold(posedge CK, negedge D, 2, 0);

As a result if there is a change at D 0.5 timeunits *before* the clock these
simulators will give a setuphold violation.  This is incorrect and not we want.

VCS will mantain the (4,-3) violation window for posedge D and the (2,-1)
violation window for negedge D.  If D changes 0.5 units *before* the clock,
since doesn't zero out anything, you won't get any timing violation (i.e.
VCS behaves correctly here.)

And, BTW, Verilog libraries should also support negative timing checks using
extended syntax of $setuphold checks as shown below :

  $setuphold(posedge phi,posedge d,0,0,notify_reg,,,delay_phi,delay_d);
  $setuphold(posedge phi,negedge d,0,0,notify_reg,,,delay_phi,delay_d);

where delay_phi and delay_d are declared as wires.

    - Rakesh Kinger
      Broadcom Corporation                       San Jose, CA

         ----    ----    ----    ----    ----    ----   ----

From: Keith Howick <howick@siliconmetrics.com>

John,

I have run into this in Verilog-XL before while working on a characterization
flow for one of our customers.  This was about two years ago and at that time
our characterization tool only captured setup and hold in an independent
fashion.  When captured this way it's pretty common to get non-overlapping
violation regions.  The problem caused our customer so much grief that they
decided to live with lost performance and asked us to build a script to
post-process our characterization results to guarantee an overlap.

The problem disappears when sequential cells are properly characterized.

During our research into this problem we rediscovered an age-old truth: setup
and hold aren't independent measurements.  Properly characterized setup and
hold is the minimum pulse-width for a synchronous pin (e.g., data).  Setup
and hold were created to accomodate the dependency of synchronous MPW on its
location relative to the clock.

My evidence is purely empirical, but when setup and hold are captured
correctly in a totally dependent manner (which we do today) the violation
regions always overlap.

    - Keith Howick
      Silicon Metrics Corp.

( ESNUG 383 Item 7 ) -------------------------------------------- [11/28/01]

From: "Muzaffer Kal" <muzaffer@dspia.com>
Subject: Anyone To Do Some Quickie Cadence Silicon Ensemble Consulting?

Hi John,

I am hoping you might have a pointer for me.  I haven't used SE before and I
am trying to do P&R to generate a macro (i.e. no pads) using Cadence SE 5.3
on a tiny block (5K gates).  I'm having numerous problems doing routing after
putting in the power stripes.  Do you have any people you know in SF bay area
who can give me half a day of consulting service just to sit with me and get
this to work?  I'd really appreciate any references.

    - Muzaffer Kal
      DSPIA, Inc.                                SF Bay Area, CA

( ESNUG 383 Item 8 ) -------------------------------------------- [11/28/01]

From: "John Cooley" <jcooley@TheWorld.com>
Subject: Are You An Avanti Customer Who Has Used Astro In A Chip Tapeout?

Howdy, All,

I was just interviewed by "Upside" magazine today and they seemed very
interested in Avanti's new Astro tools.  To my knowlege, no customer has
ever done a tapeout using Astro.  If you've done one or if you have direct
user experiences with Astro, could you please get in touch with me as
soon as possible?  I just want to make sure that Avanti (like any other
EDA vendor) gets fair treatment in "Upside".

    - John Cooley
      the ESNUG guy

( ESNUG 383 Item 9 ) -------------------------------------------- [11/28/01]

Subject: ( ESNUG 380 #11 ) Watch Out! That VCS PLI *Will* Drag You Down!

> Gregg Lahti's letter got me concerned that many VCS customers might not
> be running VCS as fast as they could.  VCS does not have nearly as
> many switches as DC, but it is very important to understand the affect
> of switches on VCS' performance.  Please post this reply to show your
> readers a quick overview about maximizing VCS performance.
>
>     - Mark Warren
>       Synopsys, Inc.                             Cupertino, CA

From: Anders Nordstrom <andersn@sympatico.ca>

Hi John,

In ESNUG 380 #11, Mark Warren from Synopsys wrote about what switches to
use to get faster RTL simulations in VCS.  I spent weeks trying different
combinations of switches without getting much more than 10 to 20 percent
improvement.  I wasn't even using the PLI and still there was something
slowing down my simulation.

It turned out that the PLI was the problem after all.  Not because I used
it but because it was there.  At Nortel, our cad support distributes VCS
with several PLI routines already linked in so that we can try different
waveform viewers and code coverage tools.

By running VCS with only the Signalscan PLI compiled in VCS (but not used),
I got a speed-up of 8 to 10 percent on a 3 Mgate RTL design and close to
20 percent on a 500 kgate RTL design.

By not compiling in any PLI routines I got a speed-up of 42 to 48 percent
on both my small and large design.

Of course, the VCS PLI is useful for waveform viewers and code coverage
tools, but I you should never even link in the VCS PLI routines if you
are not going to use them.

    - Anders Nordstrom
      Nortel Networks, Ltd.                      Ottawa Canada

( ESNUG 383 Item 10 ) ------------------------------------------- [11/28/01]

From: "Alexander Gnusin" <emae@sympatico.ca>
Subject: Former IBMer Sets Up A Users 'TCL for EDA' Scripting Web Project

Hi John,

I've started new web project called "TCL for EDA".  The main idea of this
project is to show howe TCL/TK can be used effectively for EDA scripting,
methodology setup and new tools implementation.  I am former IBMer and I
started to write scripts 5 years ago for IBM tools such as Booledozer and
Einstimer.  Then, I spent good amount of time scripting for Synopsys tools.
Recently, I recognized that it would be good idea to start document my
scripts and make them available for others.  My web site name is:

                         http://www.TCLforEDA.net

This site contains as TCL/TK tools and scripts, as some of my presentations
and papers.  I would be glad if you'll have a chance to visit my site.

    - Alexander Gnusin

( ESNUG 383 Item 11 ) ------------------------------------------- [11/28/01]

Subject: ( ESNUG 381 #7 ) CVS, Perforce, Synchronicity, RCS, ClearCase

> I am currently using ClearCase for revision control and I have used
> DesignSync from Synchronicity in the past.  I had several technical
> issues with DesignSync such as a corrupted database and being unable to
> access the latest version of files.  This was over a year ago.  I am
> sure Synchronicity fixed these issues but it caused me to switch to
> ClearCase at the time.
>
>     - Anders Nordstrom
>       Nortel Networks Ltd.                       Ottawa, Canada

From: Kris Monsen <monsen@mobilygen.com>

Hi, John,

As a person who generally hates GUIs, because they generally waste more time
than they save, I had to respond to a couple of points.  As long as a
revision-control product has the necessary features I'm fine with whatever
solution people are comfortable with, but here's my take:

 1. CVS is free.
 2. CVS supports hierarchy (don't even try using RCS).
 3. CVS is stable:
    - it has been and is being used by thousands of developers worldwide
    - I've been using it at various companies since 1994 and don't recall
      ever having a problem caused by CVS.  (I've had people try to get 
      *around* CVS using some back-door hack, but not problems using CVS
      commands themselves).
 4. CVS does work across many platforms, including just about every
    conceivable form of Unix as well as Windows.
 5. CVS does have a GUI that runs on Windows, though I won't claim it's
    the best because I haven't used it much.  I prefer command lines.
 6. CVS supports updating/checking out files based on dates. (see below)
 7. CVS supports "merging" of changes by multiple people on the same
    revision.  Why are people so spooked by this?  It is a totally
    deterministic, well-known algorithm.  If there are conflicts where
    two people have changed the same lines, then it shows you both changes
    in the merged files and requires you to manually resolve the conflict.
    In contrast to another writer, we often have people making updates
    to the same files (fixing or adding a feature in a test bench, e.g.).
    It's a great time saver to be able to make the change without waiting
    for someone else's lock.
 8. I wrote up a little tutorial that gets most people going on CVS in
    about a half hour.  Occasionally someone needs a little further
    explanation, but it hasn't been difficult to manage at all.

Tom Tessier wrote:

> For those of you familiar with Tags ask yourself this question: "What
> does it take to move a tag on a group of files with RCS/CVS/ClearCase
> and others?" 

Tom should try using:

          cvs tag -F <TAGNAME> <files_or_directory>

The -F is for force.  Depending on exactly what Tom wanted to do, there are
a couple of other options you can specify.  The CVS man page is not that
difficult.

However, I should say that it's usually a bad idea to move tags.  You may
*think* that you want to move a tag to a new/different version of a file,
but it's very likely that you, or even worse, someone else will need the
original group of files later.  By moving the tag, in effect you're erasing
some of your history.  (No, you're not losing the revision history of
individual files, you're losing the "group history" of what you had tagged;
someone may have used that tag already for checking out a release.)

It is far better simply to make a *new* tag for the new versions of file
you have.  For example, instead of using a single tag:

	GOLDEN_RTL

and moving it every time a file changes, instead you should use:

	GOLDEN_RTL_1_0
	GOLDEN_RTL_1_1
	...
	GOLDEN_RTL_2_0
etc.

Tom also wrote:

> Another problem that occurs is "It worked last week!"  How do you get
> back to that point?  In SOS it has the ability to back up to a date and
> time.  This is very powerful in a multiple person environment.

So does CVS.

	cvs update -D "2 days ago" ...
	cvs update -D "01/20/2001 15:30" ...

Very nice.

Sorry I got so long-winded.  My character flaw.

    - Kris Monsen
      Mobilygen Corp.                            Santa Clara, CA

         ----    ----    ----    ----    ----    ----   ----

From: Shiv Sikand <sikand@matrixsemi.com>

Hi, John,

I have been obsessed with the issue of Software Configuration Management
(SCM) software for hardware design for about 5 years now. I've generally
worked in the microprocessor/ full custom  arena so managing physical
databases has always been an interesting challenge.

A very important aspect of SCM is to have an integrated system for both 
hardware and software so that all aspects of the design can be 
represented through a common state point.  In 1998 I started work on a 
Cadence Integration with Perforce while working at SGI.  This integration 
is now available as Open Source under the BSD license thanks to SGI, 
Cadence and Perforce.

I presented a paper about the benfits of Perforce for hardware design 
and the Open Source Cadence-Perforce Integration at the Cadence Users 
Group meeting and the Perforce User Conference last year.  The paper is at:

   http://www.perforce.com/perforce/conf2000/shiv/shivsikand.pdf

As described in the paper, Perforce contains key features not found in 
any other tool.  The most significant one of these is InterFile Branching 
which allows a very powerful parallel and incremental 'branch and replace'
development approach to be applied to physical databases.

The code and some other publications on the subject can be found on the 
Perforce Public Depot.

   http://public.perforce.com/public/perforce/cdsp4/index.html

I'm always on the lookout for volunteers to help out in this area!

CDSP4 is currently in use at SGI, Velio Communications, Afara WebSystems 
and Matrix Semiconductor.  Perforce is also deployed quite heavily at 
nVida and MIPS.  I know that there are other sites using Perforce for 
hardware design but do not have a list offhand.  I understand that there 
are currently over 50,000 Perforce Users worldwide.  Their product, 
support and pricing structure is excellent.  And of course, once you have 
a Perforce license, CDSP4 is free.  :-)

    - Shiv Sikand
      Matrix Semiconductor

( ESNUG 383 Item 12 ) ------------------------------------------- [11/28/01]

From: Rudolf Usselmann <rudi@asics.ws>
Subject: How Can I Coax DC To Intelligently Duplicate High Fanout Nets?

Hi John!

I'm having trouble getting Design Compiler to do what I'm thinking -- not
what I'm writing.  ;)  Well almost!  Basically I have the following code:

  always @(posedge clk)
     sig_del1 <= #1 sig_in;

  always @(posedge clk)
     sig_del2 <= #1 sig_del1;

  always @(sig_sel2 or data_a or data_b)
     if(sig_del2)   dout = data_a;
        else        dout = data_b;

In this example, the data bus is 64 bits wide, and sig_del2 is driving 64
2:1 MUXes (or equivalent).  This means sig_del2 is my critical path due to
high fanout.  I tried to limit the fanout by setting set_max_fanout, in
which case DC inserts a buffer tree.  However, what I really would like
to see DC do is something like this:

   always @(posedge clk)
      sig_del1 <= #1 sig_in;

   always @(posedge clk)
      sig_del2_0 <= #1 sig_del1;

   always @(posedge clk)
      sig_del2_1 <= #1 sig_del1;

   always @(posedge clk)
      sig_del2_2 <= #1 sig_del1;

   always @(posedge clk)
      sig_del2_3 <= #1 sig_del1;

   always @(sig_sel2_0 or data_a or data_b)
      if(sig_del2_0)   dout[15:00] = data_a[15:00];
         else          dout[15:00] = data_b[15:00];

   always @(sig_sel2_1 or data_a or data_b)
      if(sig_del2_1)   dout[31:16] = data_a[31:16];
         else          dout[31:16] = data_b[31:16];

   always @(sig_sel2_2 or data_a or data_b)
      if(sig_del2_2)   dout[47:32] = data_a[47:32];
         else          dout[47:32] = data_b[47:32];

   always @(sig_sel2_3 or data_a or data_b)
      if(sig_del2_3)   dout[63:48] = data_a[63:48];
         else          dout[63:48] = data_b[63:48];

In this example I have reduced the fanout from 64 to 16, without inserting
buffers.  I do have a DC-Ultra license which is required for re-timing.  But
DC still can't read my mind...   :*(

So how can I do the above optimization automatically?  Maybe it would be
even better to reduce the fanout to 8.  DC should be able to figure this
out.  Any pointers appreciated !

    - Rudi Usselmann

( ESNUG 383 Item 13 ) ------------------------------------------- [11/28/01]

From: "Jeff Carlson" <Jeff.Carlson@compaq.com>
Subject: What's The Dirt On The Synopsys DW_Debugger?  Useful?  Useless?

Hi, John,

I'm curious if many people have used Synopsys' dw_debugger and what user
experiences they may have had with it?

    - Jeff Carlson
      Compaq

( ESNUG 383 Item 14 ) ------------------------------------------- [11/28/01]

From: "Cyrus Malek" <cyrusm@synopsys.com>
Subject: How To Treat Power Nets (PNETs) As Routing Obstructions In PhysOpt

Hi John,

Every now and then, I see a number of user questions regarding obstructions
in PhysOpt.  With PhysOpt, we classify obstruction objects as:

  (a) Power nets
  (b) Fixed placement
  (c) General 
  (d) External

In this letter I'd like to focus on "Power Net Objects".

But, before I begin, I must advise users to please not overlook the detailed
information in the Physical Compiler User Guide.  In our 2001.08 release, the
"Preparing Physical Data" chapter provides valuable information about how
PhysOpt works with obstructions.  I also recommend the following SolvNet
articles:
                  pss_kjs.html: Placement Obstructions In PhysOpt
    Physical_Synthesis-57.html: Check Size of Hard Macro LEF (Runtimes)
    Physical_Synthesis-81.html: How To Turn PNETS In Blockages 
   Physical_Synthesis-124.html: How Do I Ignore Routing Layers ?
   Physical_Synthesis-146.html: How To Remove Obstructions (2000.11)
   Physical_Synthesis-149.html: Define Custom Obstruction Over Hard Macros
   Physical_Synthesis-162.html: Get Past Over-Capacity Issues In PhysOpt
   Physical_Synthesis-178.html: Sanity Checking Your Design for PhysOpt
   Physical_Synthesis-202.html: Tips on Efficient PhysOpt Memory Usage
   Physical_Synthesis-256.html: GUI obstr and keep out buttons

That being said, I'll get back to "Power Net Objects".

In general, all cells must eventually be located in a legal location, or
'site', before the block placement is considered 'routable'.  PhysOpt knows
where the available (unobstructed) sites are by their definition in the input
floorplan.  If there are no sites in the input floorplan, then there are no
legal locations to place any cells.  

Placement obstructions are used to tell the placer what sites are blocked
(not valid locations to place cells).  Routing obstructions are factored 
into timing and congestion analysis.

For the purposes of this discussion, let us assume the input floorplan has
a clean site array defined. 

Power net objects can take on two basic forms:

    1. Default power net object = routing obstruction
    2. Placement obstruction

Let us take each of these in turn:

Power Net Routing Obstructions
------------------------------

By default, power nets (PNETs) are treated as routing obstructions and are
factored into routing congestion and net delay calculations.  If power nets
are defined in the input PDEF floorplan, these nets will resemble the
following PDEF code:

     (PNET VSS
       (TYPE GND)
       (ROUTE 
         (LAYER_WIDTH 18  100.00)
         ( 18 ( 5000.00 8000.00 ) ( 5500.00 * ) )
       )
     )

Power Net Placement Obstructions
--------------------------------

If your floorplan has pre-routed PNETs that you would like the placement
engine to heed (in addition to the congestion and delay engines), you should
use the following variables:

  set physopt_pnet_complete_blockage_layer_names "metal layer names ... "
  set physopt_pnet_partial_blockage_layer_names  "metal layer names ... "

To find out what metal layers are available, look in the technology section
of your PLIB for "routing_layer" constructs:

       /* BEGIN example Rumpelstiltskin.plib */

       routing_layer ( "METAL1" ) {
         routing_direction : horizontal;
         . . .
       } 
       routing_layer ( "METAL2" ) {
         routing_direction : vertical;
         . . .
       } 
       routing_layer ( "METAL3" ) {
         routing_direction : horizontal;
         . . .
       } 
       /* END example Rumpelstiltskin.plib */

If you do not have access to your PLIB, you can get the layer information
from your physical library loaded in memory.  If you do not know the name of
the physical library, then in a psyn_shell session, you can load your design
plus floorplan and run "legalize_placement -check".  This command will load
all the necessary libraries and link the logical to physical libraries.  A
message (or several) similar to the following will appear:

      Information: Linking logical library slow with physical
         library Rumpelstiltskin.  (PSYN-036)

You *now* know the physical library is called "Rumpelstiltskin".  Next do a
"report_lib -physical Rumpelstiltskin".  The output of this command will
contain a table of available layers in the library:

   layer    layer_type   direction    pitch        width        spacing
   ----------------------------------------------------------------------
   POLY1    masterslice  -            -            -            -
   CONT     cut          -            -            -            -
   METAL1   routing      horizontal   4.600e-01    1.300e-01    1.300e-01
   VIA12    cut          -            -            -            -
   METAL2   routing      vertical     4.600e-01    1.800e-01    1.800e-01
   VIA23    cut          -            -            -            -
   METAL3   routing      horizontal   4.600e-01    1.800e-01    1.800e-01

In the above report, any of the 'layer' entries that contain a 'layer_type'
entry of 'routing' are valid routing layers for PhysOpt.  Similarly, in the
PLIB section above, all valid routing layers are specified within the
'routing_layer()' statements.  So, from the information above we have three
routing layers available: 

                         METAL1, METAL2, METAL3

Valid routing layers can be used in the list passed to either of the two
physopt_pnet_*_blockage_layer_names variables.  As an example, using the
above data:

       set physopt_pnet_complete_blockage_layer_names "METAL1 METAL2"
       set physopt_pnet_partial_blockage_layer_names  "METAL3"

PNETs that are specified as complete blockages are 'visible' to the placement
engine during both coarse placement and placement legalization.  PNETs that
are specified as partial blockages are seen during placement legalization,
but *NOT* coarse placement *UNLESS* an additional variable is set:

       set physopt_create_placement_see_partial_blockages true

In addition, if this variable is 'true', then the user has control over what
'size' PNETs are seen as blockages during coarse placement with:

       set physopt_minimum_pnet_height <threshold value>
       set physopt_minimum_pnet_width <threshold value>

Note: Designs containing *partial* PNETs will suffer a runtime penalty, so
if it's an issue, users may choose to just use *complete* PNET blockages. 
It is recommended to only use these blockage variables if:

   1a. The power nets cover placement sites in the floorplan

                           - AND -

   1b. The power nets could possibly short to obstructions within the
       cells or completely obstruct pins within the cells

                            - OR -

    2. The power nets are rather wide and only a few routing layers
       are available (placing cells under these wide nets will
       dramatically increase local routing congestion).

Also to note: if partial blockages exist AND the *_see_partial_blockages
variable is NOT set to 'true', then large cell displacements could occur
during placement legalization.  These large displacements can negatively
affect design timing.  (I am sure no one wants to see their design that
met timing at the end of Optimization experience a dramatic increase in
negative slack due to displacements during final legalization!)

Finally, remember that variables are *not* persistent on the saved design
DB and they must be re-defined each time the tool is re-started!

To view PNETs in the psyn_gui Physical Viewer, check the box next to 'Nets'
and make sure the sub-menu has 'Power Routes' also checked.  In the
Preferences menu, you can selectively turn on/off as well as change the fill
style of the various metal layers.  Note: Setting the physopt_pnet*blockage*
variables has no effect on the appearance of PNETs in the Physical Viewer.

In closing, I would like to briefly touch on the effects of having a power
grid in your design.  When a power grid exists in the floorplan, it is more
information that PhysOpt must store, keep track of, and perform checks
against -- therefore it requires additional memory and runtime.  Moreover,
highly-complex power grids can require significant extra memory, even for
small designs, so it is advisable to check that the grid that is specified
in your floorplan makes sense.

For example, if you have many PNET structures on Metals 4, 5, and 6, but you
do not allow block-level detail routing on these levels, then instead of 
including them in your floorplan, just create full-block layer obstructions
on these levels.  On the other hand, including the power grid will provide
PhysOpt with a more realistic view of the physical aspects of the design,
improving it's ability to estimate timing and congestion.

    - Cyrus Malek
      Synopsys, Inc.                             Austin, Texas

( ESNUG 383 Item 15 ) ------------------------------------------- [11/28/01]

Subject: ( ESNUG 381 #11 ) Avanti/Chrysalis Asks For Current User Benchmark

>      Chip 2 (Gate2Gate)  4 M gates, flat design:
>
>                                Time            Memory
>                              --------        ---------
>      Avanti Chrysalis 3.0    3596 min        14870 Mbyte
>      SNPS Formality 2000.11   112 min         2399 Mbyte
>      Verplex Tuxedo 2.0.8.a    89 min         2817 Mbyte

From: Gerard Memmi <gerard@avanticorp.com>

Hi John, 

We're interested in feedback about Design VERIFYer's competitive performance,
but were surprised to see that the comparisons were done on a version that's
more than a year old and 3 revs distant -- and with numbers that are way off
that rev's usual performance.  Maybe this customer would agree to a benchmark
using the current rev? 

    - Gerard Memmi
      Avanti/Chrysalis

( ESNUG 383 Item 16 ) ------------------------------------------- [11/28/01]

Subject: ( ESNUG 380 #12 ) VCS Scales With Mhz While NC-Verilog Doesn't

> The following table lists the values reported back from the Unix "time"
> command on a couple of small synthesis jobs.
>
>            job      cpu Sun     cpu Linux PC      ratio
>              1       3059 sec     1715 sec         1.8x
>              2        652          347             1.9x
>
> Seems to track the MHz scale fairly nicely.
>
>     - Scott Evans
>       Sonics Inc.                                Mountain View, CA

From: [ To Infinity And Beyond ]

Hi, John,

Please keep me anonymous on this one.  We run multiple sim environments
(different ASICs) on Linux / Solaris here, and use both Cadence NC-Verilog
and Chronologics VCS.

VCS:
----

I've also noticed that VCS gives a speedup which scales somewhat with the
MHz of the machine.  My Linux VCS sims on a 1GHz PIII system with ( >1G of
memory) runs 50-70 % faster than on Solaris 400 MHz machines with similar
memory.  That's great, because my Linux machines are a lot cheaper and
upgrading to faster boxes and larger memory is also easy.  But I have found
if a VCS job starts swapping in and out of memory, the Solaris memory
subsystem seems to do a far better job than Linux.  I try to make sure I
have enough memory to run a job on Linux (especially for gate level sims.)

VCS has a sparse memory model that reduces memory on some of my sims.  It's
enabled with an embedded /*sparse*/ comment in your Verilog code.  Its
undocumented, of course :-), so one needs to dig around.  My AE told me to
use it as:
             reg /*sparse*/  [31:0] pattern[0:100000000]; 

I've also noticed that processors with larger caches are better.  I suppose
VCS has a high percentage of load-store instructions.  I find that in
general VCS has better memory usage than NC-Verilog, so I can pack some more
sims into a multiprocessor machine, with relatively less memory.

NC-Verilog
----------

Has anyone seen that the ratio of performance improvement is not that much
on NC-Verilog on Linux vs. Solaris?  (i.e. faster Mhz didn't speed up
NC-Verilog much.)  I would like to see if others had similar experiences.

    - [ To Infinity And Beyond ]

( ESNUG 383 Item 17 ) ------------------------------------------- [11/28/01]

Subject: ( ESNUG 382 #2 ) Rarely Noticed Library Characterization Gotchas

> None of the characterization software I've seen has handled the subtle
> tradeoff between setup, hold, and clk-to-Q delay in a way I would consider
> correct.  Partly, this is a function of .lib format, which I feel is
> fundamentally broken in this regard.  If you do a 3-D plot of setup, hold,
> and clk-to-Q for any given FF you will see that they trade off against each
> other in roughly hyperbolic fashion.  Most systems characterize setup at
> infinite hold, and characterize hold at infinite setup, and then guardband
> them using some fudge factor.  But it's possible for that process to arrive
> at setup and hold numbers such that, if you just barely meet the setup time
> and just barely meet the hold time, the FF doesn't work!  Plus, as you
> approach the minimum setup or hold, the clk-to-Q delay increases.  The .lib
> format has no mechanism for expressing any of these tradeoffs, even if you
> gather all the needed data.  That forces the guardbanding to be
> unnecessarily large, which in turn reduces the accuracy of timing analysis
> and synthesis.
>
>     - Howard Landman
>       Vitesse Semiconductor                      Longmont, CO

From: Keith Howick <howick@siliconmetrics.com>

John,

I agree with Howard concerning proper modeling of the relationships between
setup, hold and delay.  It's odd that this remark comes now as I'm presenting a
paper on this subject at DesignCon 2002.  Understanding these relationships is
very important since most dynamic glitches (mentioned originally by Mr. Kalita
from Intel) are a result of improperly constraining the sequential device.

I also agree with Howard that, regardless our understanding of the physics,
all characterization tools are limited in their efforts by the models they
must build.  None of the model formats popularly supported today for static
timing analysis can fully represent the setup-hold-delay relationships.  At
the very least, a model would require a table of hold vs. setup and the
setup slew, and a table of delay vs. both setup and hold.

We solved the problem to the best degree today's models will allow with two
measurement features: delay degradation and dependent setup and hold.  We
discovered that many of the timing difficulties our customers encounter
disappear when these two measurement styles are applied.

Measuring setup and hold using delay degradation gives the library developer
the ability to trade off the predictability of the model with the cell
performance the model represents.  Howard's correct that clock-to-Q prop
delay increases as setup or hold approach the cell's breakdown condition.
Since this isn't reflected in STA models it behooves the library developer
to avoid it.  But by how much?  Without having some measurable cell behavior
the developer is just guessing, adding a fudge factor.  Using the degradation
of delay as a reference the user can control how much modeled performance is
given up to preserve model predictability.

Measuring setup and hold in a dependent manner, as they should be, further
avoids dynamic glitches.  While assisting a customer with a modeling problem
we rediscovered an age-old truth: setup and hold are not independent
measurements.  The combination of setup and hold results in the minimum
pulse-width of a synchronous pin (e.g. data).  The two measurements are
needed to accomodate the dependence of synchronous MPW on its relative
location to the clock.  Unfortunately, today's models only allow one of the
two dimensions of these constraints to be represented; either the dynamic
pulse width is represented and the relative location lost, or the relative
location is preserved but the dynamic pulse width is lost.  Fortunately,
correctly characterizing setup and hold in a dependent fashion avoids many
of the timing difficulties this modeling weakness permits.

Thankfully, using both these methods avoids dynamic glitches due to pass
gates.  Doing so also reduces the number of vectors applied to a SPICE
netlist for verification.  Since most sequential designs don't have limits
for maximum transition time or minimum frequency designers need only test
their cells at the values of setup and hold reported by characterization;
reducing the vector set to the 2^(2*n) vector set reported by Howard.

    - Keith Howick
      Silicon Metrics Corp.

============================================================================
 Trying to figure out a Synopsys bug?  Want to hear how 11,000+ other users
    dealt with it?  Then join the E-Mail Synopsys Users Group (ESNUG)!

       !!!     "It's not a BUG,               jcooley@world.std.com
      /o o\  /  it's a FEATURE!"                 (508) 429-4357
     (  >  )
      \ - /     - John Cooley, EDA & ASIC Design Consultant in Synopsys,
      _] [_         Verilog, VHDL and numerous Design Methodologies.

      Holliston Poor Farm, P.O. Box 6222, Holliston, MA  01746-6222
    Legal Disclaimer: "As always, anything said here is only opinion."
 The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)