Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS



  Editor's Note:  Well, I'm on my annual migration to next week's San Jose
  SNUG gathering.  Looks like this year's meeting is going to be good.
  Last year we had 447 users pre-registered to attend; this year it's 534
  (a 19 percent increase in attendance -- not bad for a down ecomony.)  This
  Sunday night (March 16th) from 6:00 to 9:00 PM they'll be having a welcome
  mixer for those who have registered.  It should be fun.  I'll be the tall
  fat guy either making a fool of myself dancing the funky chicken on the
  dance floor or grazing on the free munchies.  I should be easy to spot in
  my orange and blue Florida Gators football shirt.  :)

                                               - John Cooley
                                                 the ESNUG guy

  P.S. Oh, yea, Joanne wants me to remind you that SNUG welcomes walk-in
  registrations, too.  See ya there! 


( ESNUG 408 Subjects ) ------------------------------------------ [03/13/03]

 Item  1: Nassda HSIM Runs Show That Avanti Astro Is Too Noise Pessimistic
 Item  2: Jay Asks "Is Power Compiler Is Worth Using For Low Power Designs?"
 Item  3: Three Anonymous Customers Review The Tharas Hammer Accelerator
 Item  4: User Concern About Cadence NC-Verilog vs. Synopsys VCS Mismatches
 Item  5: ( ESNUG 406 #6 ) DC Calls Up Module Compiler If The Code Is Right
 Item  6: ( ESNUG 407 #13 ) Paul Says Power Rings Are An EDA Software Crutch
 Item  7: ( ESNUG 407 #3 ) Users On How To Get Kick Ass Linux Farm VCS Runs
 Item  8: ( ESNUG 407 #11 ) Linking Issues Resolved In Behavioral Compiler
 Item  9: Newbie Avanti Apollo User Can't Get Automatic DC Scripts To Work
 Item 10: Why Can't I Get A Cost Estimate Of Migrating From FPGAs To ASICs?
 Item 11: ( ESNUG 407 #6 ) There Should Be No OS Differences With PT Results

 The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com


( ESNUG 408 Item 1 ) -------------------------------------------- [03/13/03]

From: Roger Boates <domain=st blot gone user=roger.boates>
Subject: Nassda HSIM Runs Show That Avanti Astro Is Too Noise Pessimistic

Hi, John,

I know you like Astro war stories, so I thought I'd share with you how we
discovered that it was too pessimistic on noise via our Nassda HSIM runs.

Our group was designing a DSM DSP.  We were using Avanti Astro to control
and report crosstalk.  Since Astro only monitors the maximum coupled voltage
and not the pulse width or area (energy), we decided to use a circuit
simulator to check the results and look for other possible problems.

Coupled noise from adjacent wires, aggressors, is a function of the percent
of capacitive coupling to the victim wire, the rise/fall time of the
aggressor drivers and the resistance of the victim's driver.  We found that
about 25 percent of the victims had total percent coupling over 50 percent
in a block of 30K gates.

Since Mentor's Eldo, an HSPICE-like simulator, simulated ten cycles of a
related block one-tenth the size in 190 hours, we decided to use Nassda's
HSIM, which has a larger capacity and can run faster.  Ten cycles of our
30K-gate block ran for 10 hours on a Linux server.  It had a 1.5 GHz CPU
and 2 GB of memory, although we were only able to access 1 GB.

We used extracted parameters from Avanti's Star-RCXT program to add the
capacitances to our gate-level netlist.  We set all of the resistance
values to 0 to improve the HSIM run time.  This compromised our accuracy,
but was sufficient to indicate trouble spots.

Before we ran HSIM, we had to have .measure statements in the run deck
to find the maximum voltage on each internal net during each clock cycle.
We did this by writing a SKILL program to identify all of the nets in
our Cadence-based schematic and to write a separate file using these net
names in .measure statements.  This amounted to 300K .measure statements
for every ten cycles of simulation.  We then used a .include statement
to add the file of .measure statements to the HSIM deck.

Many of the HSIM jobs ran out of memory during the post-processing of
the .measure statements after the simulation had finished, so we had to
re-run those jobs with an HSIM post-processing option, -fsdb, to get the
results.  This added 30 to 45 minutes extra run time because the job had
to be run on a slower Solaris server.

After each simulation we grep'ed and sorted the .measure output file to
get maximum voltages for noise spikes which were between 0.1 and 1.0 volts.

At approximately one cycle per hour and one license,  we were limited to
a maximum of 168 cycles per week.  We were still able to detect and
evaluate many voltage bumps.

After comparing the HSIM results with the Astro results, we concluded
that the Astro voltage spike results were quite conservative based on
the input parameters that we were using.  For example, Astro reported a
voltage spike of 0.317 volts on a net, whereas, the Hsim simulation
reported 0.107 volts.  This particular disparity was because there were
three aggressors and Astro had modeled them as occurring simultaneously
in the same timing window.  The HSIM simulation had shown that the
aggressors were distributed over a 500 pSec. interval.

The HSIM simulations that we ran contributed to the decision that our
noise model, although conservative, was too pessimistic.  We modified
our noise model which improved our Astro results.

    - Roger Boates
      STMicroelectronics                         San Diego, CA


( ESNUG 408 Item 2 ) -------------------------------------------- [03/13/03]

From: Jay Pragasam <company=brecis got don employee=jlk>
Subject: Jay Asks "Is Power Compiler Is Worth Using For Low Power Designs?"

Hello John,

Until now we have managed to deliver chips with average attention to the
"power" part of the design.  Now that we have a robust methodology to
spinout chips that meet timing and area goals, we are planning to put more
effort into the "power" part.  How to make powerful chips with less power??

Could someone who have used Power Compiler or some other similar "power 
saver" tool extensively describe their experience in achieving what they 
intended to achieve?  I'm sure there are a lot of mobile chip builders 
who read this and I believe they would have been really power conscious. 
Just wondering if they could share their nightmares, please???  Since my
current plan is to go the Power Compiler way, my questions would be based
on the Power Compiler methodology.  Of course, the results are design
specific and so if one could provide design data along with the results
that would be really great.

  1. First and foremost, did Power Compiler mess up with the timing
     objectives when it set out to save power?
  2. How do you compare againt latch-based and latch-free clock gating?
  3. How much area did you have to sacrifice and what was the power saving?
  4. Does setting set_max_dynamic_power and set_max_leakage_power to 0
     always give the best results?  Is the operation similar to set_max_area
     to achieve minimum area?
  5. Anyone ventured operand isolation?  How significant is it to save 
     power without messing up with timing?
  6. Does DFT Compiler have any problems handling the clock gated designs
     when I take care of testability during clock gating?  Is there a lot
     of difference between inserting the control point before or after the
     clock gating logic?
  7. What about Formal Verification headaches?  Does LEC/Formality recognize
     the clock gating logic?  I'm sure there are some "under the hood"
     processing by Formality to handle these logic insertions, but what 
     about Verplex LEC?
  8. How much of a correlation exists between the power reports of Power
     Compiler and PrimePower?
  9. How well did the power analysis done by PrimePower correlate with the
     real silicon power?

I'm sure there is at least one more key question to ask, but don't know that
yet.  Please fill in...

    - Jay Pragasam
      Brecis Communications                      San Jose, CA


( ESNUG 408 Item 3 ) -------------------------------------------- [03/13/03]

From: John Cooley <isp=theworld taut bon account=jcooley>
Subject: Three Anonymous Customers Review The Tharas Hammer Accelerator

Hi, all,

Here's what I got when I surveyed some Tharas Hammer users last week.

    - John Cooley
      the ESNUG guy                              Holliston, MA

         ----    ----    ----    ----    ----    ----   ----

From: [ Bachelor Number One ]

Hi John -

Here's my feedback on the Tharas Hammer.  Please note that my information is
somewhat dated as the last version of the Hammer compiler that I used was
from about November 2001.  There have most likely been updates since then.

I'll have to request the anonymous handling.

We first engaged with Tharas about August of 2000.  At that time the Hammer
Accelerator still had a few hardware bugs.  Also at that time, the Hammer
would not fit our complete design and it had some parsing problems with some
of our source code and there were some problems with using our ASIC vendor's
libraries (the "source code" of the design had some instantiated gates.)

Also at that time we were using VCS (maybe version 5 or so) as our main
simulator and it was taking 8 to 12 hours of VCS sim time or more to get
results from any given simulation.  With a one day turn time for sims, we
had a great desire to use acceleration.

By about late Dec 2000, the Tharas folks had solved all of the problems that
had prevented the use of the Hammer and we could actually start using the
Hammer for simulations in our regression flow.  Although the Hammer was
really designed for RTL acceleration, the Tharas software personnel did a
major rework of their Hammer compiler code to allow the use of the
instantiated gates.  In this time frame, the compile for a Tharas simulation
on our device was taking over 12 hours, but we were seeing a 20 to 25 times
acceleration compared with VCS 4-state no-timing sims.

At that point, the management of our ASIC group decided to purchase the
Hammer accelerator.  Throughout the next year or so, the Tharas software
group continued to make some major improvements: compile time down to about
1 hour, compile process size from 3.8 gig to about 2.2 gig, incremental
compile of the testbench part of the simulation.  (A Tharas simulation has
2 parts, the accelerated part that is run on the Hammer hardware, typically
the DUT code, but can include some test bench constructs, and a VCS
co-simulation that runs in parallel on the host workstation).

Also during that time development on VCS continued and I think it was VCS
6.1.1 where the run time of VCS sims dropped by about 40% to 50% for our
design.  Also we were making improvements to our simulation strategy to cut
the VCS sim time down (using PLI's to nearly instantly program registers
as an example.)

Tharas strengths:

  1. Regression flow nearly the same as VCS flow.
  2. Simulation results agree very well with VCS for our design.
  3. The Tharas compiler can be tweaked to address certain
     design constructs.

Tharas weaknesses: (you might pass these issues by Tharas as these issues
                    may have been addressed by this time)

  1. During our use it was hard to use the Hammer for waveform
     debug for 2 reasons:
       A. Our design just fit into the Hammer and enabling waveform
            dumping would not allow the design to completely fit.
       B. There was a pretty severe performance hit when dumping
            waveforms (VCS also has this hit but by a smaller factor).

  2. Acceleration varies depending on the design and on the test
     bench architecture.  It may be difficult to estimate the
     acceleration without actually trying the device.

I have only used an IKOS simulator for running coverage tests in the early
90's, so I can't really comment much on competitors.  I did observe some of
the evaluation process of the AXIS acceleration scheme and it seemed much
more expensive and harder to use.

The good part was how hard the Tharas people worked to make the Hammer work
for us.  They really tried to address all issues whether it was a deficiency
of the device or whether I was asking for an enhancement.

    - [ Bachelor Number One ]

         ----    ----    ----    ----    ----    ----   ----

From: [ Bachelor Number Two ]

Hi John,

I would like you to keep my name and my company's name to be anonymous.

Here are my impressions.

Hammer strengths:

  1) For our 8 million gate ASIC at RTL level pushed into the box, we get
     10-14x performance improvement over the 900 MHz Solaris machine.

  2) The same ASIC at gate level, gives us 40x+ performance improvement
     over the 900 MHz Solaris machine.

  3) Hammer is easy to use

  4) Though Tharas is a startup, we have received superior customer service
     like we expect from established companies.

Hammer weakness:

  1) Now that I can run VCS 6.2 on Linux boxes and I find 2-3x performance
     improvement on 2.4 GHz Dell Linux Servers over the 900 MHz Solaris
     machine, the price of Hammer, as long as it is connected to Sun
     Solaris, does not justify its peformance over Linux machines.  Their
     price will be more justified, if I am able to connect Hammer to a
     Linux machine and run my simulation.  Tharas is working on this now.

  2) Sometimes, the Hammer gets hung in the middle of the simulation and we
     need to get Tharas involved to resolve the issues.  Given the superior
     service they provide, this becomes a non-issue at most cases.

My Wishlist for Hammer:

  1) For an ASIC of our size, we dont know yet whether we will be able to
     run gate sim with SDF or not.  It will be a breakthrough in the
     industry if we can run gate sim with SDF using Hammer.

  2) Taking the testbench constructs in C/C++/Verilog into the box so
     that the hammer can run with minimal dependance on the host machine.

Concern:

  Being a startup, how long will they keep on providing the excellent
  customer service to us?

Overall impression:

Their strengths outweight their weaknesses.  We are glad to have Hammer in
our ASIC simulation environment.

    - [ Bachelor Number Two ]

         ----    ----    ----    ----    ----    ----   ----

From: [ Bachelor Number Three ]


Hi, John,

I will try to answer the questions you've asked about Tharas.  Over the past
5 months, I had not worked with Hammer machines because of other
assignments.  Please allow some degree of inaccuracy.

Here is my humble comments.  Must be anon.

1. What are its strengths?

   - Larger than 10x simulation throughput over VCS (in our chip.)

   - Hammer opens up the possibilities for system level simulation
     with multiple instances in RTL form in a reasonbale cost.

2. What are its weaknesses?

   - Reliance of proprietary ASIC parallel processors.  The advantage of
     Hammer's sim throughput is diminishing as Sun or Linux workstation's
     computing power keep on advancing.

     Our concern is if Tharas can keep up the pace.

   - Hammer seems to have more compiler switches than I would prefer.
     In my memory, we had to turn on one switch otherwise Hammer
     can not simulate correctly.

   - For our design it seems require more memory (than VCS) in run time.

3. How does it compare to Quickturn, Mentor, Aptix, and its others?

      compared to Quickturn: Hammer is easier, cheaper to maintain.
                             Hammer's re-compilation is much faster
                             than Quickturn's remapping.

      compared to Aptix: Hammer is less expensive. I do not have Aptix
                         performance data.

      Mentor: no data.

4. What bit of hard earned wisdom did you learn after buying Hammer that
   you had wished someone had told you about before you bought Hammer?

   - Before the purchase, we only used a small set of test patterns to
     check Hammer's performance and we gave Tharas a performance goal.
     Tharas assigned 1 full time engineer, we assign a half time engineer.

     After they met the initial performance goal and we agreed to purchase
     Hammer, we discovered that Tharas may have "optimized out", through a
     compiler switch, part of RTL which are not exercised by these patterns.

     So the performance data before the purchase decision may be misleading.

5. What good part of Hammer surprised you?

   - Last summer, we ported a pre-released RTL to Hammer and used a less
     complete verification suite designers were using.  We flushed out
     several RTL bugs which had slipped through VCS simulations.  If we
     would have a regression suite with full coverage then we might catch
     those bugs in VCS.

Thanks for the opportunity for expressing my comments.

    - [ Bachelor Number Three ]


( ESNUG 408 Item 4 ) -------------------------------------------- [03/13/03]

From: Rajkumar Kadam <movie=netcontinuum slot prawn director=rajkumar>
Subject: User Concern About Cadence NC-Verilog vs. Synopsys VCS Mismatches

Hi John,

I am evaluating license from Cadence, and I found some discrepancy with the
way VCS and NC-Verilog behaves regarding DISABLE statement in the "always"
block with Non-Blocking Assignments (NBA's).  I read the IEEE-1364-1995
Verilog LRM, and found that what NC-Verilog does is what is actually the
standard specifies.  The reply from Synopsys:

   "I found an entry in our database from 1995 that shows that this very
    issue was put to a vote by the IEEE 1364 committee.  The majority of
    the IEEE committee voted for it and Synopsys voted against it.  It
    has, therefore, been our historical perception that NBA's remain
    unspecified."

As a user I expect a standard behaviour from all EDA tools -- at least for
a language that was standardized decades ago.  Following is the code and
the output from different simulators:

  module disable_code();

  reg start;
  reg clk, rst_l;
  integer i;
  reg[7:0] list;
  reg vld_flag;

  always @ (posedge clk or negedge rst_l)
  begin
     if (!rst_l)
     begin
        vld_flag <= 1'b0;
        list <= 8'hff;
     end
     else
     begin
        if (start)
        begin : NEW_SRCH
           for (i = 0; i <= 7; i = i + 1)
           begin
              if (list[i[2:0]] == 1'b1)
              begin
                 list[i[2:0]] <= 1'b0;
                 vld_flag      <= 1'b1;
                 disable NEW_SRCH;
              end
              else
                 vld_flag <= 1'b0;
           end
        end
     end
  end

  initial
  begin
    rst_l = 0;
    start = 0;
    #50;
    rst_l = 1'b1;
    @(posedge clk );
    start = 1'b1;
    @(posedge clk );
    start = 1'b0;
  end

  initial
  begin
    clk = 0;
    forever #10 clk = ~clk;
  end

  initial
  begin
    #500;
    $display("Value of valid Flag is %d", vld_flag);
    #10;
    $finish;
  end
  endmodule

The Cadence NC-Verilog simulator gives the LRM correct value of vld_flag = 0
but Synopsys VCS erroneously gives vld_flag = 1.

I'm concerned.  What other discrepancies between these two Verilog
simulators am I missing here?  I would love a user generated list of known
differences between Synopsys VCS and Cadence NC-Verilog.  The Synopsys and
Cadence salesdroids won't give me a complete list of these mismatches, but
we need such a complete listing so we don't get burned later.

Isn't the whole idea of standards that there AREN'T discrepencies??!

    - Rajkumar Kadam
      NetContinuum, Inc.                         Santa Clara, CA


( ESNUG 408 Item 5 ) -------------------------------------------- [03/13/03]

Subject: ( ESNUG 406 #6 ) DC Calls Up Module Compiler If The Code Is Right

> I know that on setting
>
>                      dw_prefer_mc_inside = true
> 
> DesignWare invokes Module Compiler modules for some operations and no
> Module Compiler license is needed.  I am not sure if it is possible to
> invoke Module Compiler from Design Compiler for all operations (like
> divide) even if we have a Module Compiler license.  Can any one throw
> some light on this?
>
>     - Ravi Sankar Konidena


From: Vijay Raghavan <church=synopsys wrought alm preacher=vijayr>

Hi John,

I work in Customer Education in the Marlboro, MA office of Synopsys.  As
Ravi says, single operators like +, -, * within RTL VHDL/Verilog code can be
inferred and mapped using MC implementations when dw_prefer_mc_inside
variable is set true.

Since MC implements division using a function call i.e., "divide" which
performs : Q = X/Y + R and the customer has MC, he can build a divider by
reading and compiling a code written using MCL (Module Compiler Language)
within DC as follows:

NOTE: This works when DC is invoked in Tcl mode with "mcdc.tcl" sourced

  dc_shell-t> source [get_unix_variable MCDIR]/lib/tcl/mcdc.tcl
  dc_shell-t> read_mcl example.mcl
  dc_shell-t> source constraints.tcl
  dc_shell-t> mcenv dp_lang_out verilog
  dc_shell-t> compile_mcl -quiet
  dc_shell-t> report_mc -log -report -lib > example.rpt
  dc_shell-t> report_timing

The MC synthesized divider netlist (if preferred over the DW) could be
instantiated in the rest of the design like any other sub design and
then incrementally compiled for its environment.


  // example.mcl

  module DIV (Q, X, Y, Round, Arch, R);

  integer Round = 0, Arch = 3;

  input  [16] X;
  input  [8]  Y;
  output [16] Q;
  output [8]  R;

  divide (Q, X, Y, Round, Arch, R);

  endmodule


Note that the "divide" function supports 3 architectures as controlled by
Arch and it performs unsigned division, so sign should be handled
separately.

References:

  divide()    : Module Compiler Reference manual
  DW_div_fp() : Designware Foundation Library (Floating Point Divider)

Hope this helps,

    - Vijay Raghavan
      Synopsys                                   Marlboro, MA


( ESNUG 408 Item 6 ) -------------------------------------------- [03/13/03]

Subject: ( ESNUG 407 #13 ) Paul Says Power Rings Are An EDA Software Crutch

> Notching power rings: with the old tools (CDN DP) we were using, putting
> down a rectangular power ring around a core was easy; getting the ring
> to have notches around cells in the corner like PLLs took a great deal
> of hand work.  Eventually, we wrote a script to drive SE sroute to do
> this, but it took awhile to get it written right, and it must be
> rewritten for each new chip.  Now I see CDN FE Ultra can do this
> automatically if the user selects to "exclude selected blocks" when
> routing the ring.  Or at least Cadence says it can.  Can it?
>
>     - Mark Wroblewski
>       ex-Cirrus and looking                      Lafayette, CO


From: Paul Rodman <france=reshape naught awn paris=rodman>

Hi John,

At my company we decided to develop our own tools for power routing.  The
reason: all the commercial EDA tools we used PISSED US OFF and had a lowest
common denominator approach that didn't let us have a repeatable, automatic
or even semi-automatic solution.

There had to be a better way... but what?


> Multiple layers on rings: to reduce the area required for supply rings,
> we used multiple layers.  We also intermingled the nodes in these ring
> stacks, so for example the outer of two rings would be stacked as VDD,
> GND, VDD on 3 of 5 routing layers, and the inner of the two rings would
> be stacked as GND, VDD, GND on some other 3 of 5 routing layers.  Strips
> across the middle on two layers vertically and one layer horizontally
> would tie everything together and deliver the supplies to the row metal.
> We were able to work this by hand, but the old tools (CDN DP, SE Sroute
> in "automatic" usage scenarios) couldn't cope.  Does FE Ultra do any of
> this effectively?


Our teams had the usual pitched battles about how to do the rings and
notching.  It's a suprisingly complex problem, after all.  Finally,
a compromise solution emerged: have a set of nested rings using only the top
two layers, i.e. NO layer stacking.

The reason we avoid stacking is that in six or more metal layer designs
(6LM+) we can easily route stdcells under the rings.  There is not much
congestion on the edges of blocks placed at the edge of the chip...  m4 and
lower is fine for routing and the rings are free and consume no cell area.

If we have a lot of current flowing from the padcells into the ring, we
might need a bigger ring (now slotted or replicated) but this is OK, since
it costs no cell area.

HOWEVER, lately, after staring at various IR maps, we realized that we lost
more IR drop that we'd like in the stubs that go from the power/gnd pads
into the rings.  We don't see why we should have to lose anything, if there
is still more area for more metal around (which there is).  Obviously this
is a very high current connection, it's at the top of the food chain after
all.

So, now I ask:

WHAT ARE THE RINGS FOR?

Are they some kind of Anti Satanic Sealer Ring of Safety?  Are they the
salt-thrown-over-the-shoulder of ASIC power distribution?

ASSERTION #1: The ring metal isn't doing anything except acting as a simple
way to connect the outputs of power pads into the mesh.  The actual
transverse-direction current carrying aspects of the ring are minimal.

ASSERTION #2: Rings are a CRUTCH for LAZY, OLD, EDA software.  A paradigm
for the days before pervasive full-chip fine grain meshs and good IR
tools existed.

In our 6LM+ methodology we have a full, "fine grain" mesh covering the core
of the chip.  I'll define fine grain at any grid with one +- pair less than
every 30 um or so in a 130 nm process.


What we think is a better goal is to use any and all available area and
layers to make something I tentatively call a "dagger" or "pitchfork" of
metal.  It would expand out as wide as possible, (as soon as possible),
after exiting the padcell's power/gnd pin.  Then this metal would merge
it into the core mesh for enough distance that the mesh picks up the job
of transverse dispersal.  Lower IR drop is the result since more metal
applied where it is needed.

Of course, such layout is not easy to generate.  You need to worry about
all sorts of design rule issues and you don't want to block routabilty to
the next door I/O cell's signal pins, etc.  It may require undoing and
tweaking the mesh in that area as well.  We do want our stdcells under that
area too, so its getting complex.  Also remember the pad cell pitches and
the mesh strap pitches are not related, so some interesting "beat frequency"
relative placement cases come up.

I think if you look at IR maps of how meshes work you will see why I feel
strongly.  The segments of ring between padcells aren't really doing much
for you... the mesh itself is so strongly connected in the same direction
as the ring you put in.  (Assuming your mesh isn't totally undersized.)


> Ring macros and other special cases: SE Sroute does a decent job of
> connecting row metal to ring macros (e.g., RAMs, register files) in most
> cases but coughs sometimes where high congestion exists.  (For example,
> where a via was dropped to get from the macro's internal supply to the
> ring around the macro.)  Unfortunately, this happened often enough that
> we couldn't ignore it, so more hand fixing.  How is FE with this today, as
> I understand it uses a new version of SE's Sroute for most heavy lifting?


Mark is obviously dealing with special IP that requires an external ring.
My condolences.  At least the Artisan and Virage RAMs come with a set of
"ring-pins" that you can set the size of that work a lot better.

But, such pre-built macro rings are another example of burning layout
resources to make software easier.  In fact, for "advanced" users of the
same RAMs, vias are dropped directly into the RAM core metal shapes to power
them and the internal "ring pins" can be dispensed with.  That is, all those
nasty m4 obstructions in the abstract are really power and ground pins, too.
Why present them to the tools and get more connection "meat" into the macro?


Alas, here the problem isn't just software, it's also the problem of
providing iron-clad rules-for-use from the RAM vendor to the user.  With a
ring, they can present a simple spec for how to give the RAM proper care
and feeding.  However, if you just let a user punch vias down internally,
and you get rid of the ring, you need a spec for "how much is good enough",
or you need good EDA tools to confirm for you that you are OK.  And the
"must-join" issues could be hard to check.  Not an easy thing to do, and
some of users will surely surely get it wrong and gum things up...

Anyway, I'm not a fan of complexity for the hell of it, but I think it is
worth noting that there is some optimizations out there to be had.


After saying all the above, we actually leave the rings on the commercial
compiled RAMs and punch vias down from our meshs onto them.  It's fast and
reliable and gives a really good connection so the compiled RAM macro rings
can be pretty damn small, and so the savings of ringless is small here.  We
can handle both rotated and normal orientations, too.  We do this connection
before stdcell placement, by the way, so that the virtual router understands
the implications of any track-blocking vias that get dropped...

To summarize, at my company we think:

      1) Almost no cell area should rarely be lost to any power
         routing. (analog IP power the major exception)

      2) we should be able to get a few mv of drop out of the "dagger" idea,
         Every millivolt we can get by being smart for "free" is worth it.

Meshes are in COMMON USE NOW.  So it's time to rethink the point-tools we
use to connect them up!


> Power supply design and analysis: Our old way of design and analysis for
> the power supply metal was an MS Excel spreadsheet.  What I really was
> looking for was a tool that studied the placed netlist and helped me beef
> up or trim down the power supply grid.  FE claims to do this.  What's the
> truth?  And what kind of clock trees does it assume?  Zero skew?  Useful
> skew?  Or does it use a netlist with clock trees inserted?


I haven't had a chance to try First Encounter for this, (want to!) but I do
know a bit about how Avanti's Mars Rail and now Astro Rail seem to work,
and I suspect they are all pretty similar.

They are doing time-averaged power only, so the skew issues aren't relevant.
They can provide pretty reasonable estimates and give you basic warm fuzzies
about your power metal.

You provide clock frequencies, and per-cycle estimated "switch factors" (aka
"switching probabilities") for the nets in your design.  You do this with
regular expressions on the netnames, etc.

If you have very VERY extensive simulation results you can use this, too.
(Most people don't..chips too damn big, vectors too lousy.)  Some folks
doing DSP kinds of things, might be able to use numerical analysis to get a
better idea on switch factors as another tack we've seen.

Given the lack of accuracy in the switch factors, you would need to be
careful in doing per-stdcell voltage value (aka "in context") timing based
on the IR map -- you need to model the errors in the IR map itself to
be safe.

What really helps you sleep at night is the EM output from these tools,
since it is getting easier and easier to create EM problems without any IR
drop problems on a chip.

Gross EM problems due to buggy layout, e.g. missing vias etc. are found with
the EM tools, but we've also found cases where simple padcell stubs were EM
violators, or layout inside the padcell itself had a problem.  (e.g. too
much power being drawn in one area due to all-layer-blockages that had
punched out the meshes too much, putting a large load on a single or a few
power padcells.) 

It's good to find these things *before* you freeze the padring and launch
the package and board design.  :)


And relevant to Mark's "skew assumed by power tools" question, there is an
announcement for a new tool called "CoolTime" (from Sequence), that *claims*
to do what I, personally, have been wishing for in my Xmas stocking: a tool
doing "switching windows" (a la PrimeTime-SI) but presenting the data per
unit area.  However, I am deeply concerned that CoolTime is going a bit off
the deep end in it's claims.  The problem with modeling actual dynamic power
switching is that the transition times are really small in the cases that
are nasty AND the set of all possible chips that are going to be
manufactured can have a zillion relative net delay differences due to
transistor and (independently) metal variation.  Therefore, the windows have
to be LARGE, or if not, it has to add the switching noise of one output into
a multiple, small windows.

I wonder if the whole thing becomes too worst case to be as accurate as they
claim?

I would like to see an ebeam trace showing the the power drop matching the
tools results for a complex chip running some repeating test pattern.  Dare
I hope for such a thing?  No?  Well, failing that I want to know exactly how
the results are calculated... might trust it then, but not before.

    - Paul Rodman, CTO
      ReShape, Inc.                              Mountain View, CA


( ESNUG 408 Item 7 ) -------------------------------------------- [03/13/03]

Subject: ( ESNUG 407 #3 ) Users On How To Get Kick Ass Linux Farm VCS Runs

> I am working to set up a Linux farm as an addition to our Sun compute
> farm.  However, during testing of our simulation runs, I have not been
> able to get any type of substantial speed-up that everyone seems to be
> talking about...
>
>     - Philip Strykowski
>       Mindspeed Technologies                     Massachusetts


From: Steven Leung <army=ali.com.tw soldier=steven_leung>

Hi, John,

I did a Linux eval in my previous life about 1.5 yr ago.  At that point, the
cheap PC server models were P3 up to 1.3GHz.  Two things learned from that
evaluation:

  - The speedup (compared with Sun 400/440 MHz machines) is application
    dependent.  DC/PT can have achieve close to the ideal speed up (based
    on CPU clock rate.)  But VCS simulation can only achieve about 30% speed
    up.  Yet, in VCS+Specman runs, it can achieve a speed up of over 2X.

  - The cheap server type typically has 2 CPUs sharing the same memory bus.
    That appears to be the bottleneck when running memory-intensive apps.
    If you run 2 sim jobs on the 2-CPU server, not only there is no
    speed-up, but the run time will actually be significantly longer!  In
    comparison, Sun's memory architecture is superior.  There is no
    discernable slowdown when the Sun servers are fully utilized.

As a result, we bought PC servers with only 1 CPU installed.  The Xeon-CPU
servers may be better, but they are also much more expensive, defeating the
purpose of leveraging cheap PC hardware.  I was also told that the P4 CPU
has more pipeline stages than P3 so that even though the clock rate is
higher, the actual speed up for sim type applications is not much better.
I have seen some data appear to support that but no hard evidence.

    - Steven Leung
      Ali Corp                                   Shanghai, China

         ----    ----    ----    ----    ----    ----   ----

From: Eric Deal <frame=conexant sought john picture=eric.deal>

Hi, John,

I work in the Austin design center at Conexant.  We've been migrating from
Sun to Linux over the past 3 years and now run nearly everything on Linux
because of the speed increase (and the fact that we can buy 4+ Linux
machines for the cost of each (slower) Solaris machine).

In my experience, Linux should run much faster than Solaris.  My benchmarks
are about 6-9 months old:
			                    Dell               Compaq
                                            server             desktop
   Test Name      Sun450   Sun900  P3/1000  P4/1700  AMD/1533  P4/1700

   calibre1       123      50      40       32       29        34
   calibre2       3781     1687    2644     2145     2087        
   dc1            8534     4249    4284     3744     3102      4874
   dc2            4000     2160    1921     1490     1316      1793
   physopt        8384     4605    4779     3727     3114        
   hspice         931      620     897      590      525       722
   vcs1           2902     1496    1658     1320     1392      1533
   vcs2           285      159     200      129      107       226
   povray         665      317     313      200      144       258

Relative Performance

   calibre1       1.00     2.46    3.08     3.84     4.24      3.62
   calibre2       1.00     2.24    1.43     1.76     1.81
   dc1            1.00     2.01    1.99     2.28     2.75      1.75
   dc2            1.00     1.85    2.08     2.68     3.04      2.23
   physopt        1.00     1.82    1.75     2.25     2.69
   hspice         1.00     1.50    1.04     1.58     1.77      1.29
   vcs1           1.00     1.94    1.75     2.20     2.08      1.89
   vcs2           1.00     1.79    1.43     2.21     2.66      1.26
   povray         1.00     2.10    2.12     3.33     4.62      2.58
                                                
   Mean           1.00     1.97    1.85     2.46     2.85      2.09
   Geom. Mean     1.00     1.95    1.78     2.37     2.71        

Here are some things to check that might be causing the disparity:

  * Ethernet connections -- check that both Linux and Solaris machines
    are on the same speed network.  When running stuff from the network
    10T will cause a machine to slow down significantly versus 100T.

  * VCS version.  Early Linux versions of VCS (5.x) were not well optimized
    for Linux, but were for Solaris.  My older benchmarks (not shown) 
    were run with this and the Linux machines needed about a 1.7x-2.0x 
    more MHz to compete with Sun.  As long as you're running a later version
    of VCS (6.0+), this shouldn't be an issue.

  * When benchmarking, I like to limit the number of external factors,
    so typically I'll copy all the design/testbench files to a local 
    hard disk and run the simulation from there.  This cuts out most
    of the network discrepancies between the runs.

Finally, the type of PC hardware used can make a difference.  Philip's note
mentioned that he wasn't paging the simulation.  I've noticed that the cheap
desktop systems do perform slightly worse than a "server" configuration,
usually because of the memory subsystem.  In the table above, the last
column is a Compaq desktop while column 4 is a Dell server.  In general, the
numbers are similar, but the Dell machine typically posted better numbers.

    - Eric Deal
      Conexant                                   Austin, TX

         ----    ----    ----    ----    ----    ----   ----

From: Aaron Smith <jail=motorola clot von convict=aaron.smith>

Hi John,

I am running a few Linux machines (Dell & HP workstations w/ RH7.2) here at
Motorola, and we are seeing significant performance improvements with Linux.
It depends somewhat on what you're doing, though.  The Intel machines aren't
nearly as good with floating point ops as the Sun machines, (see
http://www.spec.org) so your performance gains may be less than you might
expect if you are running full timing on a gate-level sim.

Memory bandwidth is also an issue for some applications with about any Intel
machine before the Pentium 4s with the 400MHz or 533MHz front-side bus.  If
your machines are using PC-133 SDRAM instead of Rambus or DDR, that might be
a reason for poor performance.

Also, we had some issues with the 3com network drivers under Linux that
require some options to be passed to the driver to force full duplex
100 Mbps operation.  Our network switches are forced to 100 Mbps, FD, which
causes auto-negotiation to fail and the Linux box to end up at 100Mbps, HD.
Also be sure that your NFS blocksize (for mounts from Solaris machines) is
set to 32768.

   //sample from /etc/fstab for our ClearCase mounts:
   ccvob:/share/vob    /net/ccvob/share/vob        nfs 
   nfsvers=3, rsize=32768, wsize=32768, hard, intr, nodev, nosuid, tcp
   ccview:/share/view  /net/ccview/share/view      nfs 
   nfsvers=3, rsize=32768, wsize=32768, hard, intr, nodev, nosuid, tcp

Hope this helps.  We are seeing that P3-800s with Rambus are running almost
2X our Ultra60/450MHz machines and comparable to a SunFire with 750 MHz
UltraSparc-IIIs, and that a P4/1700 with 2G Rambus is running about 30-40%
faster than the Sun Fire.  Most of what we do is pure functional simulation
with no timing information.

    - Aaron Smith
      Motorola                                   Chandler, AZ

         ----    ----    ----    ----    ----    ----   ----

From: Greg Arena <library=comcast.net book=garena6>

Hi, John,

Are your hard disks IDE or SCSI?  If they're IDE, do you have DMA and 32-bit
mode enabled?  Are the drives Ultra/ATA-100 capable?  The problem might not
be with memory & swap, but with updating the VCD file (if you have it
turned on).  You can find out by running the /sbin/hdparm program as root
(enter "hdparm /dev/hd?" - this will report on all hard drives).

I noticed that my VCS simulations ran considerably slower with DMA and
32-bit mode turned off.  This was the default for my setup and I would have
to change it manually each time I logged in until I figured out where to put
it in the system initialization scripts. But I would know right away if I
had forgotten to set DMA mode by how much slower the sims were running.

If you don't have Ultra/ATA-100 capable drives (or a chipset/PCI controller
that supports it), I suggest you consider upgrading - that makes a big
difference.

    - Greg Arena                                 Williamstown, NJ

         ----    ----    ----    ----    ----    ----   ----

From: Ajoy Aswadhati <class=force10networks got don student=ajoy>

Hi John,

We ran into the same problem that Philip has encountered, when trying to add
Linux m/c's to our Sun farm.  We found initially that our 1 Ghz Linux
machines were comparable to a 3 year old Sun machines, when running
NC-Verilog sims.  When we upgraded to a 2.2 Ghz Xeon based server we were
surprised to see *NO* speed improvement between the 1 Ghz machines and 2.2
Ghz Xeon's.  This got us concerned and we had a conference call with Cadence
folks to investigate what's going on.  They mentioned that raw CPU speed
alone does not help Verilog sims.  Fast access to memory is equally
important.  They gave us helpful pointers to kinds of machines they use in
their benchmarks. 

We found that the core logic (chips bridging CPU/memory) used in our servers
where garden variety commodity chipsets from Intel (not server class).  At
that time (last year) we found a vendor supplying Serverworks chipset (now
part of Broadcom) which had the max Mhz front side bus w/ 2 way interleaved
memory.  We got *2X* performance right off the bat as compared to 1 Ghz
system.  I don't know who makes better chipsets at this point. 

You can get better systems now.  Look for fast front side bus and preferably
four-way interleaved access to memory.  Even though Philip uses VCS he
should get similar improvements with the right system.

This is what I gathered from going through older ESNUG archives.

We bought a SuperMicro P4DLR+ from this vendor.

                        http://www.supermicro.com

I would like to know if other users can send pointers to the latest/greatest
Linux machines optimized for Verilog simulations.

I forgot to mention another important thing.  Get the fastest DDR memory for
the system.  Make sure your system supports the fastest memory out there.

    - Ajoy Aswadhati
      Force10 Networks                           Milpitas, CA

         ----    ----    ----    ----    ----    ----   ----

From: Robert Clark <2nd=paramanet pott bomb 1st=rac>

Hi John,

I saw similar results when benchmarking a Pentium III 933 MHz processor with
a ServerWorks HE chip set (Linux RedHat 7.2), against a SUN 450MHz Ultra
Sparc II system (Solaris 5.8).  The SUN was consistently faster with
simulations that required less than 8 MB of memory.  Beyond 8 MB, the
Pentium III pulled ahead.  This is probably due to the 4 MB of second level
cache on the SUN.  Now things get quite interesting.  I benchmarked an AMD
Athlon XP 1800+, and it ran nearly twice as fast as the Pentium III system.
I also benchmarked the Athlon XP 2000+ and 2100+ CPUs and found that there
was no performance increase with the higher CPU clock rate.  What did I
learn from this exercise?

  1. The memory performance has the greatest influence on the VCS simulation
     performance.  Which means you have to choose the right chip set for VCS
     simulation runs.  This is no surprise given that the expected data
     cache hit rate for a VCS simulation is very low.

  2. AMD has a Synopsys VCS site license with hundreds of AMD systems used
     for CPU verification, and rumor has it, that AMD helped Synopsys tweak
     the VCS compiler to turn out the most efficient assembly code for the
     AMD processors.

Based on the performance of the Athlon system for Verilog RTL verification,
we purchased 20 machines for our Linux server farm.  I also benchmarked a
Pentium IV with RDRAM, and found it to be equivalent to the Athlon XP 1800+.
We chose the Athlon system over the Pentium IV, because RDRAM and the
Pentium IV processor were quite a bit more expensive with 512MB of RAM.
All of our Athlon systems that run gate level simulations have 1.5GB of RAM.
This would have been very expensive Pentium IV system with RDRAM.  We also
use the Athlon servers for hspice runs, because they are 2X+ faster than our
SUN 750MHz UltraSparc III system.

We can probably help Phillip out, if he can release the hardware details of
his Linux systems.

Our systems are "commodity priced clone" based systems (less than $1,200),
with the following system components:

   1. EPOX 8KHA+ motherboard with VIA KT266A chip set
   2. AMD Athlon XP 1800+, 1900+, 2000+, and 2100+ CPUs
   3. 1.5GB DDR PC2100 RAM 3X512MB 2-2-2-2 memory timing
   4. 2U rack mount case
   5. RedHat Linux 7.2

Verilog RTL net list and VCS compilation flags:

   1. Fairly large SOC design with 4 million gates.  The RTL for regression
      has 176,000 lines of synthesizable RTL code.
   2. +nospecify and +radlite are used to optimize the Verilog for
      regression runs.

A major part of the SOC consists of 2 instances of a very large block, and
8 instances of a large block, that VCS compiles into an image that requires
only 25MB of memory to run.  VCS is doing a good job minimizing memory
usage.  The same netlist synthesizes to 4 million gates.  This requires
1.3 GB of memory to simulate.  Yes, we use static timing analysis and
equivalence checking to verify the gate level netlist with the reference
RTL, but we still run gate level simulations, to get a warm and fuzzy
feeling before tape out.

If I were building a system for a VCS server farm today, I would go with
the nVidia nforcre2 400 MHz Dual-DDR chip set with one of the 333 MHz FSB
Athlons.  VCS simulation performance will improve a bit more after AMD
releases the 400MHz FSB CPU this May.

Has anyone benchmarked an Itanium system with VCS?

    - Robert Clark
      Parama Networks, Inc.                      San Jose, CA

         ----    ----    ----    ----    ----    ----   ----

From: Eyal Landesberg <ocean=zoran.co.il ship=eyall>

Hi, John,

The reason for performance difference of simulators on different platforms
is the cache size.  Philip doesn't mention the cache size of his Sun
workstations, but I assume that the Solaris 2.7 workstations have 4 MB
cache, while the Linux workstations have only 512 KB cache.

In Zoran, we compared performance of NC-Verilog on Solaris (5.8) 900 MHz
with 8 MB cache, vs. Linux Intel XEON 2.4GHz 512KB cache, and we found
that the Solaris performance is better for 10% - X2.  The big performance
difference depends on the test.  We got for evaluation a Linux workstation
with 1 MB cache (Intel Genuine 1.60 GHz), and this Linux performance was
equal to Solaris 900MHz.

    - Eyal Landesberg
      Zoran                                      Israel

         ----    ----    ----    ----    ----    ----   ----

From: Dave Cronauer <vatican=synopsys lot guam pope=davecr>

Hi, John,

There is a paper at next week's SNUG in San Jose, "Blazing Saddles: Getting
the performance Out of VCS", by Gregg Lahti et. al.  Though it doesn't
directly address the UNIX/Linux issue, it still may be beneficial.  Walk-in
registrations are welcome at SNUG.  If Philip can't attend, he may be able
to get a copy of the paper.

    - Dave Cronauer
      Synopsys                                   Hillsboro, OR

         ----    ----    ----    ----    ----    ----   ----

From: Russell Petersen <forrest=subasic.sciatl not prom tree=russp>

Hi, John,

First thing, Philip must upgrade his Redhat to at least 7.2 on all machines
and I would use a 2.4 kernel (much faster kernel overall).  I don't believe
Synopsys even supports the Redhat 6.2.  Second, choose your Linux boxes 
carefully.   We use 1.2 and 1.4 Ghz PIII Tualatin machines for our runs 
here at SA from Penguin Computing and we regularly see 2X speedups.  In 
fact, the Linux boxes are about as fast as the 900 Mhz UltraSparc III 
machines we own.   The Tualatin is important because it has a much bigger 
cache than normal PIII's which really seems to help VCS sims.   Finally, 
we tested a P4 2 Ghz machine a while back and it was actually slower than 
the PIII's.  This is because Intel went and optimized the pipeline for 
video operations (you know, consumer home-video types of apps) which 
doubled the pipeline length.  This has a negative effect on Verilog 
simulations.

    - Russ Petersen
      Scientific Atlanta, Inc.                   Lawrenceville, GA

         ----    ----    ----    ----    ----    ----   ----

From: Philip Strykowski <pack=mindspeed jot alm wolf==philip.strykowski>

Hi, John,

Just to follow up, we are using an AMD Athlon MP, 1900+ with 2GB RAM and
100 Meg Ethernet connections, not sure about the memory type, but we have
dual socket with only have 1 CPU per machine.  We have a demo Dual P4
2 GHz machine with hyperthreading installed.  It runs at 2x our Solaris
machine speed.  We use this for interactive and wave runs.

After fixing everything, we've also found DC synthesis 3x faster on
Linux over Solaris, and from what I have heard from designers, VCS build
times are 2-3x faster and run times are 1.5x faster than Solaris.

    - Philip Strykowski
      Mindspeed Technologies                     Massachusetts


( ESNUG 408 Item 8 ) -------------------------------------------- [03/13/03]

Subject: ( ESNUG 407 #11 ) Linking Issues Resolved In Behavioral Compiler

> I am having some trouble with my design since I can not successfully to
> link it because of unresolved references created by Behavioral Compiler.
> I run the Behavioral Compiler (compile_systemc, bc_time_design, and
> schedule commands) on my behavioral submodules.  Everything goes in
> right.  But after elaborating the Top, the link command fails.  The
> Behavioral Compiler creates designs associated to submodules hierarchy
> ( group1_0, group 3_1, loop_18 ... and so on) and some of these
> automatically created designs have the same name while belonging to
> different submodules.  That results in design name conflicts and causes
> the linker to attempt to link the reference to the wrong design.  (Error
> LINK-1)  How to avoid this error?  I tried to use command "rename_design"
> without success. 
>
>     - Lotfi Guedria
>       Cetic                                      Belgium


From: Aaik Van Der Poel <nightclub=synopsys got gone stripper=aaik>

Hi, John,

The symptoms described by Lotfi point in the direction of pre-2002.05 usage,
of Behavioral Compiler where there were some naming conflicts with the
Verilog flow using Behavioral submodules.  These conflicts happen when BC
generates the RTL for the top-level modules.

All these modules then contain similar submodule names like mselectxxx and
groupxxx.

In Verilog this causes an error.

Starting with BC 2002.05 we uniquify those submodules with names like

                     <top_module>_mselectxxx etc.

Workarounds for pre-2002.05 versions are:

   1. use db files directly
   2. use VHDL output
   3. compile the individual module to gate level and do the top level
      linking on gate level

Hope this helps.

    - Aaik Van Der Poel
      Synopsys                                   Mountain View, CA


( ESNUG 408 Item 9 ) -------------------------------------------- [03/13/03]

From: ShengYu Shen <school=nudt.edu.cn fish=syshen>
Subject: Newbie Avanti Apollo User Can't Get Automatic DC Scripts To Work

Hi John,

I am an newbie to ESNUG.  I have an question about interoperation between
DC and Avanti Apollo.  Apollo uses the command cmCmdECODump to generate an
ECO file, then uses the auGenSynopBackAnn command to convert this ECO to
a DC script.  When DC trys to read in this script, it complains that it
finds some undefined design.  I check the script and find that: the name
of some nets in TOPLEVEL design have been cut into two substring, and the
second substring is used as the name of an design.  Following is an example
ECO and script:

ECO:

   +N mc_led_5_ecoNet_ft225
   +I mc_led_5_ecoInst_ft225 BUFCLKHD80X
   +P Z mc_led_5_ecoInst_ft225 mc_led[5]
   +P A mc_led_5_ecoInst_ft225 mc_led_5_ecoNet_ft225

DC script:

    /* create NET /mc_led_5_ecoNet_ft225 */
    current_design "led_5_ecoNet_ft225"
    create_net "mc_led_5_ecoNet_ft225" 

Who can tell me how to resolve this, please?

    - ShengYu Shen
      National University Of Defence Technology  Hunan, China


( ESNUG 408 Item 10 ) ------------------------------------------- [03/13/03]

From: Dan Lawton <england=wincam fought pomme london=dan>
Subject: Why Can't I Get A Cost Estimate Of Migrating From FPGAs To ASICs?

Hi, John,

When I talk to the companies that do this for a living they make it so
complicated, it's like trying to get a used car dealer to tell you the
bottom line on the final financing.

I'm designing a system using a Xilinx xc2s300E FPGA, it costs me $40, and
I'd just like to know what a copy of it in ASIC might cost me (if it goes
into production.)  I mean, someone must know this off the top of their
head.  If I can get the cost down to $20, I can sell them in Taiwan.

Just a ballpark price would give me something to tell my Taiwanese boss.
I wouldn't complain if it turned out to be inaccurate.

    - Dan Lawton
      StarDot Technologies                       Buena Park, CA


( ESNUG 408 Item 11 ) ------------------------------------------- [03/13/03]

Subject: ( ESNUG 407 #6 ) There Should Be No OS Differences With PT Results

> We have seen similar issues in the past, specifically with differences in
> results not only across platform but also between 32 and 64 bit versions
> in PrimeTime 2000.11.  ...  We have just migrated to PrimeTime 2002.09 and
> our in-house QA shows it's now consistent across Sun & HP, at 32 & 64 bit.
>
>    - [ The Winchester History Mouse ]


From: Gordon Yip <egypt=synopsys what lawn cairo=gordon.yip>

Hi John,

In brief, there should be no differences in PrimeTime results between 
different OS and between 32-bit and 64-bit, with perhaps the exception 
of rounding error-induced differences of 32-bit.  If any are found, these
need to be reported to Synopsys immediately.

With respect to the original posting made last October by Craig Taniguchi 
in ESNUG 400 #7 (differences in PrimeTime results between OS), the 
difference in PrimeTime results was actually due to different SDF files 
between the vendor and Craig, not due to differences between HPUX and 
Linux/Solaris.  Craig's vendor gave him a different SDF file than the 
one they were annotating, hence the difference.  

With respect to the anon posting in ESNUG 407 #6, the specific difference
reported between HP 32-bit and 64-bit 2000.11-SP1 release was fixed in
2002.03.  The user's testing of 2002.09 HP confirms that as well. 

    - Gordon Yip
      Synopsys                                   Mountain View, CA


============================================================================
 Trying to figure out a Synopsys bug?  Want to hear how 16,583 other users
  dealt with it?  Then join the E-Mail Synopsys Users Group (ESNUG)!
 
     !!!     "It's not a BUG,               jcooley@TheWorld.com
    /o o\  /  it's a FEATURE!"                 (508) 429-4357
   (  >  )
    \ - /     - John Cooley, EDA & ASIC Design Consultant in Synopsys,
    _] [_         Verilog, VHDL and numerous Design Methodologies.

    Holliston Poor Farm, P.O. Box 6222, Holliston, MA  01746-6222
  Legal Disclaimer: "As always, anything said here is only opinion."
 The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)