Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS

  Editor's Note: I go on vacation for the first time in years.  I'm away
  for 8 days only to return to find 1.) one of my clients in serious
  crisis; 2.) the friend whom I had trusted to send out my Industry Gadfly
  column on the ESNUG mailing list messed up & did it twice; 3.) I have
  an 8 day backlog of work; and 4.) now I have to find a new home for
  www.DeepChip.com because Genedax has suddenly gone out of business.

  And tundra Boston seems even colder after 8 days in tropical Costa Rica.

  I hate vacations.
                                                 - John Cooley
                                                   the ESNUG guy

( ESNUG 343 Subjects ) ------------------------------------------- [2/00]

 Item  1: ( ESNUG 342 #2 )  Bad "set_dont_touch_network", Clocks, & DC 99.10
 Item  2: Replies To The "Reading EDA Tea Leaves" Physical Synthesis Column
 Item  3: ( ESNUG 342 #4 )  QuickBench More Cost-Effective Than Vera/Specman
 Item  4: ( ESNUG 342 #7 )  Cadence Affirma NC Sim Is Single Kernel, Too
 Item  5: DC 99.05/.10 Isn't Carrying "Don't Care" X's Correctly In Casex
 Item  6: ( ESNUG 341 #7 )  Hey!  Use Synplicity Certify To FPGA Prototype!
 Item  7: ( ESNUG 342 #9 )  We Quickly Switched From Fastscan To TetraMax
 Item  8: ( ESNUG 341 #11 )  Strange Timing Bug *NOT* Found In VSS 99.10
 Item  9: Anyone Have Experiences Using The Linux Versions Of Vera & VCS ?
 Item 10: ( ESNUG 342 #13 )  How To Use/Make A .lib Optimal For Synthesis
 Item 11: Vera Really Needs A Better Debugger And Mulit-Dimensional Arrays
 Item 12: ( ESNUG 334 #9 )  My Less Messy "Translating DC To TCL" Story
 Item 13: 8 Engineers Discussing 7 Types Of Adder Hardware Implementations

 The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com


( ESNUG 343 Item 1 ) --------------------------------------------- [2/00]

Subject: ( ESNUG 342 #2 )  Bad "set_dont_touch_network", Clocks, & DC 99.10

> We found a bug in DC 99.10 where:
>
>    1) You clock on the negedge of the clock 
>    2) You have a constant as the data in to a flop
>    3) And your clock has set_dont_touch_network on it
>
> It would add buffers to the clock, even though you told DC to leave it
> alone.  The answer that came back from R&D at Synopsys was:
>
>      Get rid of the set_dont_touch_network on your clock.
>
> In DC 99.10 they started assuming an ideal clock; set_dont_touch_network
> was redundant.  They got rid of the set_dont_touch_network and the extra
> clock buffering went away.  We're still waiting to hear why we have
> dangling gates, but simplify_constants -boundary_optimization gets rid of
> them.
>
>     - Kayla Klingman
>       Tektronix, Inc.                                  Oregon


From: Gzim Derti <gderti@intrinsix.com>

John,

I want to thank Kayla for her observation on this issue.... I thought I was
going crazy!!!  I've been compiling some blocks for a customer for a few
months now and just recently I've noticed that since 99.10 that every once
in a while it would NOT correctly listen to my set_dont_touch_network on a
clock in a structural design....  The structural had 2 clocks, named CLK and
CLK32FC. The CLK32FC clock was correct while the CLK clock (redundant, I
know...) was BUFFERED!  BUT, when I compiled this structural manually, the
issue went away....  huh?

I've decided that using 99.10 is NOT worth the struggles that I'm having
with it...  In the past 3 months I've had 5 LOGID's opened with Support,
two of which reached STAR status...

Anyway, I just wanted to confirm that I've seen the same thing, John.

    - Gzim Derti
      Intrinsix Corp.                          Rochester, NY

         ----    ----    ----    ----    ----    ----   ----

From: [ A Little Bird ]

Hi John, anon please

We found this issue some months ago.  I filed a STAR for it and got a
workaround: compile_map_for_delay = true should avoid buffers in the clock
tree.  This happens only, if the data inputs of the FF's are connected to
logic 1 or logic 0.  Otherwise set_dont_touch_network is working.

    - [ A Little Bird ]


( ESNUG 343 Item 2 ) --------------------------------------------- [2/00]

Subject: Replies To The "Reading EDA Tea Leaves" Physical Synthesis Column

> If you want to know how the five way physical synthesis horse race is
> going, you've got to track what the engineers who actually use such tools
> are saying and, more importantly, the bugs they find. ... Engineers who
> actually use a tool will have some praise and lots of gripes.  It's
> human nature.  No software is bug free.  When you're seeing lots of public
> bug talk about a tool, you truely know that it's being widely used.  When
> you see only public happiness and sunshine about a tool; be aware that
> you're the one being used.


From: Joe Hutt <joe@Magma-DA.COM>

Good article John.  I think design data is the only way to measure a tool
set.  I've spent my whole life trying to figure out the design problems
we're facing now and the amount of chatter is greater now then ever before.
The real test is can you do the design. The only thing is that sometimes
taxi cabs are useful even at Synopsys.

    - Joe Hutt	
      V.P. Engineering 
      Magma Design Automation

         ----    ----    ----    ----    ----    ----   ----

From: [ One Of The EDA Boys ]

John, please keep me anonymous.

Don't want to start a flame war, but unfortunately, this latest message
from you seems to be just a little bit too much "Synopsys" biased that (for
me) it raises doubts about your integrity.  No, I'm not calling you a liar,
it's just that you seem to be getting very close to playing (or getting
deceived by) the same marketing tricks that you have exposed in the past.
   
Do you really believe that those tapeouts occured without significant
handholding from Synopsys?  I don't think so, which means that Synopsys is
as much in taxi-cab mode as anyone else.
   
As you can see from my e-mail address (which I hope you've erased), I work
at an EDA company that isn't connected to the spin machines of any of the
mentioned companies.  I hope to be somewhat unbiased on the place and route
solutions.  Given what I've seen, I've reached the follwing conclusions:
   
   1. There is a hell of a lot of hype going around.

   2. Customer endorsements are very carefully worded such that they
      endorse without actually endorsing.  Fujitsu bought into multiple
      solutions, but was quoted by one vendor as a major win.  TI bought
      Magma, but TI still uses other solutions.  Mitsubishi supports all
      the EDA vendors, yet Cadence cites their support as a major win.
      etc. etc.

   3. Who is winning technically is not clear.

   4. Very few benchmarks seem to have completed.

I am absolutely positive that when some benchmarks have completed, we will 
definitely hear about the results from the winning vendor.  Since these
benchmark boasts have not materialized yet, we can only conclude that no-one
is in boasting  position, yet.  The moment you find one, John, could you
please make sure to publish it in ESNUG?
   
Sorry to rant, but I needed to give you my $0.02

    - [ One Of The EDA Boys ]

         ----    ----    ----    ----    ----    ----   ----

> Cadence PKS was believed to be struggling with 3 conflicting timing
> engines between PKS, Qplace, and Pearl.  Then, last week in ESNUG 342,
> Jay McDougal of Agilent reported only a 0-3 percent timing error between
> PKS, Qplace, Pearl, and even PrimeTime.  


From: Hong Li <hongli@cadence.com>

John,

You made a mistake.  It should be two timing engines, PKS and Qplace because
Qplace uses Pearl as its timing engine.  There are two engines ONLY if you
run placement outside PKS.  Otherwise, there is only one timing engine
because Qplace is also integrated with PKS which uses PKS timing engine in
this case.

Hopefully this clarifies your statement.

    - Hong Li
      Cadence

         ----    ----    ----    ----    ----    ----   ----

From: Donna Rigali <donnar@cadence.com>

John,

Why is there no mention of the different timing engines with the Synopsys
flow?  For instance, Design Compiler uses DesignTime, yet Primetime is the
sign-off engine, and I'm not sure what the timing engine is in PhysOpt or
Chip Architect.  In addition, in the case of John Stahl from Avici, Avanti
is the back-end tool so there is a different timing engine between the front
and back-end there.  There is also no mention of the 2 different placement
engines between PhysOpt and any back-end tools, which I would think would
have a huge impact on timing closure and predictability.

Yes, there are 2 different timing engines between PKS and Cadence's back-end
tools, but there is a common placement engine and, as the letter from
Agilent showed, the pre- and post- layout correlation was excellent, which
is the end objective.

    - Donna Rigali
      Cadence

         ----    ----    ----    ----    ----    ----   ----

From: Ian Buckley <ianb@8x8.com>

John,

This is a topic very close to my heart right now and I'm in the middle of
evaluating the bulk of these tools and flows.  However you've just confused
me with the following conflicting statements:

  "Having said this, here's the early February status of the RTL-to-GDS-II
   race.  With the exception of Synopsys and Monterey, most of the EDA
   vendors (Cadence, Avanti, and Magma) seem stuck in taxi cab mode with
   their physical synthesis tools."

  "Avanti and a very noisey Magma are playing up vague customer endorsements
   in the press, but nothing verifiable of course.  Monterey is just a town
   in California.  These are still taxi cab companies."

So far I've refused to even engage Monterey in evaluation because it seemed
like vaporware to me, so I'm perplexed that you singled it out as being
established (like Physical compiler) in your first comment.  Typo?

    - Ian Buckley
      8x8 Inc.                                Santa Clara, CA

         ----    ----    ----    ----    ----    ----   ----

From: Mike Kobe <kobe@meltdown.sps.mot.com>

In one paragraph you say that the EDA vendors are stuck in taxi cab mode
except for Monterey and Synopsys.  In the next paragrah you say Monterey is
just a town in California that is still in taxi cab mode.  We are engaging
with Monterey and Synopsys on physical synthesis.  Do have any useful info
on Monterey?

    - Mike Kobe
      Motorola

         ----    ----    ----    ----    ----    ----   ----

From: Jay Vleeschhouwer <Jay_Vleeschhouwer@ml.com>

John,

In your "bug talk" article, you state first that "with the exception of 
Synopsys and Monterey, most of the EDA vendors (Cadence, Avanti, and Magma)
seem stuck in taxicab mode".  Later in the article, however, you refer to
Monterey as just a town in California, suggesting that you don't see much 
evidence of real customer usage (of Dolphin).

Also, can you elaborate on why you think Synopsys is 6 months ahead of PKS?
I spoke with one large semi company yesterday who suggested the reverse.

    - Jay Vleeschhouwer, Analyst
      Merrill Lynch                              New York, NY


 [ Editor's Note: Sorry for the semantic error in my column concerning
   Monterey.  What I was trying to describe was a pack where Cadence,
   Avanti, and Magma were all in taxicab mode with Synopsys & Monterey
   outside of that pack -- Synopsys being 6 months ahead of the pack and
   Monterey not even appearing in the race.  Sorry.  (BTW, I just heard a
   rumor this week that Monterey was running into funding problems, too.)

   To Jay: you might want to reread [ One Of The EDA Boys ]'s letter above
   where he says: "Customer endorsements are very carefully worded such
   that they endorse without actually endorsing.  Fujitsu bought into
   multiple solutions, but was quoted by one vendor as a major win.  TI
   bought Magma, but TI still uses other solutions.  Mitsubishi supports
   all the EDA vendors, yet Cadence cites their support as a major win."

   You should be aware that on-going, behind-the-scenes business deals
   (most of which don't even involve physical synthesis) make it trivial
   to get such "customer quotes" for a press release or article, Jay.

   Knowing this, I wrote my "track bug talk / ignore everything else"
   column to teach readers how I read the industry.  Deal only in hard
   data and with people who ACTUALLY use the tool.  That being said, I
   *know* nVidia and Matrox have used PhysOpt to tape-out chips.  I've
   *personally* talked to their engineers and I've read their no-bullshit,
   warts-and-all reviews of PhysOpt -- hence, I *know* PhysOpt is real.
   And that data was from 4 months ago.  ( See ESNUG 335 #1 )

   When I was writing that column, I phoned Jeff Roane, the Cadence PKS
   marketing bigwig, and asked if he had any PKS tape-outs yet.  He said
   "No."  I said he should get some.  He replied "Tell me something I don't
   know!"  And I got no real hint from him as to when one would be coming.

   So that means Jeff probably won't have a PKS tape-out for at least
   another month or two; unless he rigs a fake one (which I don't think
   he'd do; too dangerous if exposed).  Hence my reasoning was that
   Synopsys was 4 months (currently known) + 2 months (probable) = 6 months
   ahead of Cadence in physical synthesis.  Feel free to agree or disagree
   with my assessment, Jay.  I'm just telling you what *I* see in the hard
   data.  I could be wrong; but in my gut, I don't think I am.  - John ]

                                             
( ESNUG 343 Item 3 ) --------------------------------------------- [2/00]

Subject: ( ESNUG 342 #4 )  QuickBench More Cost-Effective Than Vera/Specman

> You've invited contributions on VERA and Specman, and I'd like to see our
> product added in there, too.  Chronology QuickBench has a hardware
> verification language called RAVE which stacks up very well against VERA
> or Verisity's e language, along with a powerful tool for creating bus
> functional models.
>
>    - Wade Hostler,
>      Sr. Apps Engineer
>      Chronology Corporation                      Oceanside, CA


From: Andrew Frazer <Andy.Frazer@idt.com>

John,

My company, IDT, evaluated VERA, Specman and Quickbench about one year ago.
Although VERA and Specman are very powerful system verification solutions, 
Quickbench is a more cost-effective solution.  QB may not do everything
that VERA and Specman can do, it does do most of what the design groups
wants and is, therefore, easier to learn.

If you're looking for a system verification solution, you should consider
all three: Quickbench, VERA and Specman.

    - Andy Frazer
      Integrated Device Technology             Santa Clara, CA


( ESNUG 343 Item 4 ) --------------------------------------------- [2/00]

Subject: ( ESNUG 342 #7 )  Cadence Affirma NC Sim Is Single Kernel, Too

> MTI makes a darn good simulator for cosim.  It's the only single-kernel
> simulator on the market.  Having both languages running and debuggable
> in the same simulator is the only way to go.
>
>     - Gregg Lahti
>       Intel Corp                                    Chandler, AZ


From: John Willoughby <jww@cadence.com>

Hi John,

I read Gregg's recent post and just had to write in reply.  Cadence's
Affirma NC Sim is not only a mixed-language, single-kernel simulator, but I
could easily make the argument that Affirma NC is actually the ONLY real
mixed-language single-kernel simulator available.  MTI was a VHDL 
simulator that they added Verilog support to, as opposed to Affirma NC Sim
which was designed from the ground up as a mixed-language simulator.  This
is one of the reasons that we can offer very fast Verilog and very fast
VHDL simulation with no penalty for mixed-language simulation.  Naturally,
all of our debug tools provide full support for both languages as well.

I'll spare you the full pitch but I did want to set the record straight.  I 
know that Affirma NC Sim is in use at Intel including sites in Chandler, AZ.

    - John Willoughby, Marketdroid
      Cadence Design Systems


( ESNUG 343 Item 5 ) --------------------------------------------- [2/00]

From: Michael Jarchi <jarchi@vitesse.com>
Subject: DC 99.05/.10 Isn't Carrying "Don't Care" X's Correctly In Casex

Hi John,

Here's a recent issue I found with Synopsys DC 99.05/.10 that other users
may be interested in hearing about.  I have a wire defined as:

  assign mask3 = {byte1[7],6'hx,byte1[7],byte2[7],byte2[7],5'hx,byte2[7],
                  byte1[6],6'hx,byte1[6],byte2[6],byte2[6],5'hx,byte2[6],
                  byte1[5],6'hx,byte1[5],byte2[5],byte2[5],5'hx,byte2[5],
                  byte1[4],6'hx,byte1[4],byte2[4],byte2[4],5'hx,byte2[4],
                  byte1[3],6'hx,byte1[3],byte2[3],byte2[3],5'hx,byte2[3],
                  byte1[2],6'hx,byte1[2],byte2[2],byte2[2],5'hx,byte2[2],
                  byte1[1],6'hx,byte1[1],byte2[1],byte2[1],5'hx,byte2[1],
                  byte1[0],6'hx,byte1[0],byte2[0],byte2[0],5'hx,byte2[0]};
    
And used in the RTL as:

    casex({ena1, ena2, ena3, ena4, DATA[191:0]})
          {1'b1, 1'bx, 1'b0, 1'b1, 1'bx,mask1,63'hx}:  flag  <= #1 1;
          {1'b0, 1'b1, 1'b0, 1'b1, 1'bx,mask2,63'hx}:  flag  <= #1 1;
          {1'b0, 1'b0, 1'b0, 1'b1, 1'bx,mask3,63'hx}:  flag  <= #1 1;
          default:                                     flag  <= #1 0;
    endcase

Well, it turns out DC is not carrying the "don't care" x's over into the
casex statements correctly when analyzing/mapping/etc of the RTL.
Verilog-XL sims are functionally correct, but the gate-level are wrong.
When I edit the RTL and place the wire assignment value directly into the
casex statements, all is fine with the Synopsys gate-level result.   I let
Synopsys know about this problem and they have been examining the problem
for the last several weeks.  If it is an illegal coding style, Synopsys DC
never complains about it.

    - Mike Jarchi
      Vitesse Semiconductor Corp.                 Camarillo, CA


( ESNUG 343 Item 6 ) --------------------------------------------- [2/00]

Subject: ( ESNUG 341 #7 )  Hey!  Use Synplicity Certify To FPGA Prototype!

> As part of the verification of our ASIC design, we plan to build an FPGA
> based prototype using Altera 20K Apex devices.  Unfortunately we're having
> problems compiling our VHDL directly w/ Quartus (Altera's Apex software).
>
>     - Justin Smith
>       Atmosphere Networks               Osborne Park, Western Australia


From: Chuck Seeley <cdseeley@micron.com>

Hi, John,

My name is Chuck Seeley and I work for Micron Technology.  We currently are
developing a prototype board of an ASIC design out of Xilinx FPGAs.  We are
currently in layout of our board and hope to power it on around March 1st.
I'm writing you to share my experience with a product from Synplicity called
Certify which is specifically for those who are using FPGAs to prototype
their ASIC design.  While I'm not going to mention every feature of Certify,
I found it extremely valuable in partitioning our ASIC design into 10 Xilinx
XCV1000 FPGAs.  As you know the two design constraints a designer faces when
trying to put a design into an FPGA is pin count and gate count.  In our
case we implemented a Time Division Multiplexing (TDM) scheme to reduce our
pin count.  However Certify does provide the designer with a couple of such
schemes as well as allowing the designer to implement his own scheme as we
did.  One of the nicest Certify features is that it provides a GUI interface
which allows a designer to view a schematic of his design and provides the
ability to "drag and drop" modules into graphical representations of FPGAs.
Modules can be replicated several times if necessary in multiple FPGAs and
more importantly it provide the designer to select a module and do a "what
if" analysis of placing the module into any FPGA.  As modules are placed
into the FPGAs the GUI provides two bars to indicate the pin count and gate
count usage to help keep track of these two important constraints.  Another
nice feature is that when a module is selected there is a pin connection
matrix which allows the designer to readily see the number of pin
connections to other modules which aids in determining which FPGA the module
might best be placed to reduce pin count.

Let me close by saying that I've been an ASIC designer for over 10 years and
I emulated designs with Quickturn and IKOS emulation systems.  I've also
spent time looking at the possibility of prototyping ASICs with FPGAs before
and this product is one of the better ones I've ever used.

    - Chuck Seeley
      Micron Technology

         ----    ----    ----    ----    ----    ----   ----

From: Mona Chu <monac@el.nec.com>

Hi, John,

On "Altera Apex Is Flakey; How About Using FPGA Compiler II Instead?", we
are also building SOC-ASIC prototypes.  We use Synplicity's Certify to
compile our designs in Verilog.  Certify allows the user to partition his
design in various ways on FPGAs.  This is important for a large design that
may not be able to fit into one FPGA.

I also encountered errors during compiling, but they were mainly syntax
errors, ambiguous design coding, and designs that can not be implemented
in FPGAs such as "bus holders" (which require weak/strong logic).  With
modifications to the design, I don't have much problem using Synplicity's
Certify to finish my compile.  

    - Mona Chu
      NEC

         ----    ----    ----    ----    ----    ----   ----

From: Ivan-Pierre Batinic <ivan_batinic@3mts.com>

Dear John,

Synplicity offers a multi-FPGA target mapping package integrated into their
synthesis tool called "Certify", which maps a given Verilog/VHDL design
across 1 or more FPGAs.  You can do an FPGA implementation of large ASIC
designs of block functional modeling with design validation.  They also
offer a superset of tools integrating the Certify package with their
floor-planning tool called "Amplify'.  Until recently, my experience with
Synplicity was solely w/ their original base synthesis package "Synplify".
To date, it still delivers the fastest synthesis, with the best accuracy
prior to partitioning for FPGAs.

We're newbie users of Certify.  It's much more powerful than it may seem
at first glance.  For example, it knows FPGA interconnection schemes
quite well -- which begs the question "What about intervening logic and/or
memory on the PCB?".  We expected answers ranging from "limited to wires
only", "combinatorial logic only" to "FPGA embedded memory only", or
"handled manually as a post-map procedure" or even laughter.  Instead, our
experience so far demonstrates that any intervening design realized in the
physical target proto-platform can be described in the platform's behavioral
model and merged.  We can freely mix and match FPGAs, memories, and MSI
logic on our PCBs with Certify.

It's also good for design debugging.  For example, assume your FPGA PCB
comes up short with its digital interconnect between two FPGAs for a given
synchronous transaction.  (Let's say it was incurred by a significant design
change midstream in the validation phase, a functional block retargeted to
another FPGA for performance, or a reduction of FPGA I/O.)  Certify
automatically inserts a time-multiplexed [n:m] transceiver on each FPGA, at
the available interconnection I/Os.  Though it's a departure from the
original design, the intent of the design is maintained, by allowing the
PCB implementation to transparently operate normally (with respect to the
original target design's synchronous clock, the function is performed
without a hitch).

As for Synplicity's floor planner "Amplify", I have only seen demos.

    - Ivan-Pierre Batinic
      Third Millennium Test Solutions                San Jose, CA

         ----    ----    ----    ----    ----    ----   ----

From: [ One Day At A Time ]

John, keep me anonymous.

If your going to spend money on synthesis tools for you emulation project:

  1. ($) Strongly recommend you try Synplify (www.synplicity.com) over FPGA
     Compiler II.  I don't know anyone who has actually run a real design
     through both and chosen Synopsys.  It works well w/ Quartus.

  2. ($$$$) Have a look at Certify... from the same company.  Integrates a
     powerful partitioner (replication, TDM (Virtual wire) etc.)

If you don't want (or can't) spend anything:

  1. You are correct that Altera synthesis doesn't support enough of the
     VHDL language to be useful.  If you decide you can restrict your coding
     style to suit Apex (because you dont want to spend the $$) you would be
     better off using Verilog. -- not recommended

  2. Use DC w/ Synopsys w/ FPGA vendor libraries -- not recommended

Here is an excerpt of an email thread I had with someone wanting to know
about FPGA synthesis for emulation:

 "If we get the Synopsys lib from the vendor for free, are you saying that
  we can run DC directly -- and not use FPGA compiler II or FPGA express?"

  Yes, but put it this way...  The results are such that they are not
  generally considered to be a competitor...  But it's worth a try.  Altera
  has an app note on how to do it (i.e. the correct variables for the EDIF
  writer etc.) but I think you need to ask for the library, you can't just
  download it.  If you can afford it, get Synplify now and amortize it over
  your next few projects.  (Consider the situation where on your next
  project your gate count exceeds the capacity of your current FPGA w/ DC
  but with Synplify it fits .... now its cheaper to buy Synplify than new
  boards.)

 "If we buy the tool from Synplicity, who provides the library -- is it
  Synplicity or the vendor? Whats the approximate cost for Synplicity?"

  There is no library per se., Synplicity writes their own mapper for each
  target architecture, so you don't need one.  Not sure on the price, if
  you get a node locked Altera only license you are probably under 8k.

Thanks for keeping up the useful dialog on ESNUG, John.

    - [ One Day At A Time ]


( ESNUG 343 Item 7 ) --------------------------------------------- [2/00]

Subject: ( ESNUG 342 #9 )  We Quickly Switched From Fastscan To TetraMax

> We've recently run TetraMax on a complex 3 million gate design that is
> already in production.  The original set of vectors (30 million) had been
> generated using Synopsys TestGenXP (formerly Sunrise).  We used to run
> TestGenXP ATPG distributed over a dozen hosts, resulting in about 3 days
> CPU time (more than one month on single CPU!!!).  We did the same with
> TetraMax and the results have been quite astonishing: 4 days on a single
> CPU only, producing *half* the vectors w/ a negligible coverage
> reduction (about 1%).
>
>     - Roberto Mattiuzzo
>       STMicroelectronics                             Agrate, Italy


From: tturner@broadcom.com (Tony Turner)

Hi, John,

We have just completed 3 chips with over 2 million gates each.  We tried
Mentor's Fastscan and TetraMAX in the beginning and quickly switched to
TetraMAX.

The first reason for switching was the libraries used by Fastscan are not
the same as the Verilog libs.  So we spent a lot of time chasing small
library issues.  TetraMAX however uses the same Verilog libs as our chip
sims -- this alone was enough reason to go with TetraMAX.

Then after using the tool for a while, the graphical debug of blocked
chains and uncontrollable nodes was so much better than the cryptic error
messages other tools give.  As for speed, the longest it takes to get a full
set of patterns is 12hrs.  On small blocks (100K gates) if you are debugging
something, you can have a Verilog simulatable set of patterns in 20 mins.
Got some cells with unreliable capture?  Mask them in the tool -- don't use
tester time to do it.  Have tons of ram blocks or non-scan custom cells?
Just use black_box -- no need to build any models.  This is the way EDA
tools are supposed to be done -- too bad it took till the 21st century to
get there.

    - Tony Turner
      Broadcom Corp.

         ----    ----    ----    ----    ----    ----   ----

From: Danilo Grassi <Danilo.Grassi@accent.it>

Hi, John,

Here are my impressions of Synopsys TetraMAX ATPG.  Previously I have
been a Sunrise/Synopsys testgen user.  And I have noticed the enormous
reduction of design compilation and test generation runtime when using
tmax.  But this is not the only thing that has really impressed me.
Other tools have similar capabilities.  I worked on a design with a
lot of problems in resolution of internal tristate buses when generating
ATPG patterns.  I tried with three different tools, and TetraMAX was the
only one that was able to generate test pattern in a very simple way,
with a very short runtime, without bus resolution conflicts and obtaining
the best test coverage result.

    - Danilo Grassi
      Accent S.r.l.                   Vimercate (MI), Italy



( ESNUG 343 Item 8 ) --------------------------------------------- [2/00]

Subject: ( ESNUG 341 #11 )  Strange Timing Bug *NOT* Found In VSS 99.10

> I want to share with you and the ESNUG crowd an amazing VSS bug.  I had
> this fragment of code:
>
>   p0 : process (clk, GSR)
>   begin
>     if (GSR = '1') then
>       addr_d <= (others => '0');
>     elsif (clk'event and clk = '1') then
>       addr_p <= ui_addr;
>     end if;
>   end process;
>  
> that refused to simulate the way it should with VSS  99.10. addr_p was not
> the registered version of ui_addr, but an identical copy of it.  After a
> few hours I discovered that the problem was with the "addr_p" identifier:
> changing the name of the signal to "addr_d" solved the problem.  Isn't
> this amazing?
>
>     - Dr. Arrigo Benedetti
>       Caltech                                   Pasadena, CA


From: Evan Lavelle <eml@riverside-machines.com>

Hi, John,

I was interested to see this code on ESNUG.  Agreed, the simulation
results are nothing like the expected results, but to be fair to Synopsys
(not that I normally am) this is a tricky problem.  This is a known
pathological case for synthesizers, since it's not easy to represent the
code in real hardware.  The problem is in deciding what to do when there's
a rising clock edge, but GSR is already at '1' (or, even worse, a rising
clock edge and a rising GSR edge at the same time).  In this case, the
GSR has the higher priority, and the clock must be ignored.  The 'real'
hardware should therefore look something like this:

                   -----------------------
                   |     ___     ------   |  ADDR_P
                   -----|   |    |   Q|---.---------
                        |MUX|----|D   |
         UI_ADDR -------|___|  --|>   |
                          |    | ------
              GSR----------    |
              CLK --------------

In other words, if GSR is already high, then the clock edge simply loads
ADDR_P back into the register. This isn't perfect, since some delay has
to built into CLK to account for a rising edge on both GSR and CLK at
the same time.

Of course, VSS isn't a synthesizer and should be able to show the
expected output from the model.  This makes you wonder if it has some
special synthesis knowledge, and was getting confused because of that.

    - Evan Lavelle
      Riverside Machines Ltd.                         UK

         ----    ----    ----    ----    ----    ----   ----

From: [ A Synopsys VSS VHDL CAE ]

John,

I believe the problem is with the code and not VSS.  But, I'm assuming
he was trying to model a typical register.  If he was attempting to
model something else, please let me know.

A process that models the rising edge registering of an input value with an
asynchronous reset should assign the reset value and the input value to the
same target.  His original code assigned the reset value to addr_d and the
input value to addr_p under the reset and rising edge clock conditions
respectively.  When he modified the code to assign the input value to the
same target as the reset value (addr_d), he fixed the problem in the VHDL.

In his simulation, addr_p would never have been reset to all 0's when the
GSR signal toggled high.  It would have retained its current value.
Conversely, addr_d would start simulation with it's initial value (probably
all U's) and would have toggled to all 0's when GSR went high and then
remained that value throughout the simulation.  (This assumes that there
are no other drivers for addr_d and addr_p).

    - [ A Synopsys VSS VHDL CAE ]


( ESNUG 343 Item 9 ) --------------------------------------------- [2/00]

From: Sherri Al-Ashari <sherria@corvia.com>
Subject: Anyone Have Experiences Using The Linux Versions Of Vera & VCS ?

Hi, John,

What are people's experiences with using the Linux version of Vera 4.1.3
along with VCS 5.1?  Do they yield the same results (and performance)
as their Unix versions?  We'll be doing our own evaluation, still it would
be nice to know ahead of time how successful we'll be.

    - Sherri Al-Ashari
      Corvia Networks                          Sunnyvale, CA


( ESNUG 343 Item 10 ) -------------------------------------------- [2/00]

Subject: ( ESNUG 342 #13 )  How To Use/Make A .lib Optimal For Synthesis

> We acknowledge if we change Synopsys .lib slightly on its cell repertoire
> or cell timing value, the sythesis result is affected much.  When we
> added some new cells, the synthesis result became worse in some cases.  We
> changed the cell timing a little bit faster, the result was much improved.
>
> We know that constraints, WLM, cell area and other factors are related
> to determine the synthesis result.  But we would like to know if someone
> has a guide line of developing the library: how we should develop library
> optimal for Design Compiler.  We tried to find out any application note
> and/or documents on Synopsys SolvNET Web and contacted our Synopsys AE,
> but we can not get useful information so far.
>
>     - [ One Of The 47 Ronin ]


From: Nisenbaum Doron <doron@chipx.co.il>

Hi John,

We always had the problems mentioned in this issue (unpredictable synthesis
results per a library change).  This is the first time we got some kind of
a guide from the Israel Synopsys support team.  It's a 10 page article in
SolvNet titled "DC Library Ultra Guidelines" (October 1999) and it has
the detailed description of the new DC timing model, how to analyze a .lib,
and plus basic library developer guidelines for their new timing model.
It's Synthesis-625.html on SolvNet.

A good, meaty guideline of library developing is sure missing.

    - Doron Nisenbaum
      Chip Express (Israel)                          Haifa, Israel

         ----    ----    ----    ----    ----    ----   ----

From: [ Norman de Plume ]

Hi John,

I'm involved in maintaining a 0.35u standard cell library, as well as
providing a backend timing flow.

I'm interested in any discussion on defining what makes a library "synthesis
friendly", how to handle complex timing arcs, synthesis constraints for
place and route, LVS flows for designs with multiple power supplies,
physical verification at 0.25u and below, etc., etc.

I'm looking for insight into defining synthesis constraints.  In my past
life, I always relied on the input drive, output loading, and clock
definitions provided by an ASIC vendor.  Instead of set_driving_cell, I've
started to use set_input_transition, with a number that was mostly pulled
from the air.  How do you derive a number for set_input_transition?  Do most
people use a percentage of the clock frequency?  Or is it the average
transition of a pad cell loaded by max_fanout NAND gates?  Similarly with
max_fanout and output loading, what are reasonable constraints?  I know
there will be many points along the banana curve as you sweep max_fanout,
but what determines the "right" number?  And finally, should you

        set_load = max_fanout * the input cap of a NAND gate

and     set_port_fanout_number = max_fanout?

I know these are esoteric questions for most people, but I'd like to start
a discussion for the benefit of all people new to the fine details.

Anon, please.

    - [ Norman de Plume ]


( ESNUG 343 Item 11 ) -------------------------------------------- [2/00]

From: [ Not Another Elf ]
Subject: Vera Really Needs A Better Debugger And Mulit-Dimensional Arrays

Hi John, please keep me anonymous.

I'm a happy Vera user but I have 3 big complaints:

  1. The language doesn't support multi-dimensional arrays.  This has huge
     implications on how one can code a testbench.

  2. While Synopsys has just released a new Vera debugger, it still needs
     a lot of work.  What made them decide to release the debugger with a
     Windows sytle interface is beyond me.  The debugger window doesn't seem
     to behave like a normal X window and can be very frustrating.  I really
     hope they release a native UNIX version soon.

  3. The Vera compiler messages have very little information content and
     usually only the first message is valid.

I, too, would like to see more technical Vera discussion on ESNUG.

    - [ Not Another Elf ]


( ESNUG 343 Item 12 ) -------------------------------------------- [2/00]

From: Mark Andrews <Mark.Andrews@eng.efi.com>
Subject: ( ESNUG 334 #9 )  My Less Messy "Translating DC To TCL" Story

John,

As I just ported most of my dc scripts from dc_shell to dc_shell -tcl, I
thought I would share my experiences and ask my unanswered questions.

I found the following references essential...

    1) Recent posts from Gzim Derti and "Tickle Me Elmo" in ESNUG 334 #9
    2) DC-Tcl Questions from SNUG '99 (Synthesis-490.html from Solvit)
       which has an description of what Synopsys did with
       collections - or why they didn't use lists. You will need
       to understand collections if you plan to use Tcl in dc_shell.

I found the following references helpful...

    1) Chapter 3 from John Ousterhouts "Tcl and the TK Toolkit" which
       describes how and why the Tcl parser does what it does. It 
       explains why the parser drops off the end of the file if a }
       is missing.
    2) The Tcl/Tk Reference Guide (from Tcl web sites)
    3) "Introduction to Tcl" from the Synopsys Solvit web site

DC-TRANSCRIPT

Overall dc-transcript did an awful job of converting my dc-shell scripts to
tcl.  Very few of them worked first time, but it at least put me in the
right ballpark. I was pretty frustrated for the first few files, but once I
started to understand the dumb things it repeatedly did (thanks Gzim), it
became relatively simple to fix its output. 

HELPFUL HINT 1
--------------

Say you have this in dc_shell, i.e. you are using a variable to test
something in a filter command:

    test="true"
    filter( find( design, "*" ), "@is_mapped == test" )

dc-transcript will convert it to:

    set test {true}
    filter [find design {*}] {@is_mapped == test}

This always returns an empty collection, it should have been:

    set test {true}
    filter [find design {*}] "@is_mapped == $test"

Note:
 1) We need a $ in front of test in the filter command to indicate that
    test is a variable
 2) In order for $test to be correctly substituted we need "" instead of {}

HELPFUL HINT 2
--------------

If you have "foreach" in a dc_shell script, dc-transcript will either select
"foreach" or "foreach_in_collection".  In my experience the tcl script
always needed "foreach_in_collection".  Note also that unlike "foreach" in
dc_shell "foreach_in_collection {variable, collection} {}" leaves variable
empty after the last loop.

QUESTION 1
----------

Do we need the tcl equivalent of this in dc_shell -tcl?

    foreach(design_name,dc_shell_status){}

QUESTION 2
----------

Do we need these anymore?

    list_name = {}
    int_name = 0
    str_name = ""

Dc-transcript dutifully converts them to tcl, but tcl doesn't know the
difference between and integer and a string. And we don't really want a
list in tcl, we need a collection. I wasn't really sure what the correct
thing to do was, so I just deleted them all, (everything still seems to
work...).

QUESTION 3
----------

How do I create an empty collection?

QUESTION 4
----------

Is a collection a regular tcl variable with a hidden Synopsys type?  Or is
it a tcl list with one entry and a hidden Synopsys type?  By type I mean
design, net, port etc.

    - Mark Andrews
      Electronics For Imaging, Inc.             Foster City, CA


( ESNUG 343 Item 13 ) -------------------------------------------- [2/00]

Subject: 8 Engineers Discussing 7 Types Of Adder Hardware Implementations

> What are the specifications of a "Carry-Look-Ahead-Adder"?  Is this the
> same as a "Carry-Save-Adder"?
>
>     - Reto Amherd
>       HSR Hochschule Rapperswil                Wald, Switzerland


From: S.D. Lew <s_d_lew@my-deja.com>

They're not the same.  It's perhaps easier to explain by looking at several
different types of basic adders...

  Bit Serial Adder:
    Adds 1 bit at a time. The carry out of the least significant bit
    is computed and input to the next more significant bit up to the
    most significant bit.  Unfortunatly this means the sum for a bit
    can't be computed before the lower bit's carry is computed.
    Just like it's done by hand.

  Carry Ripple Adder:
    Same as a BSA except the intermediate results aren't saved.  As
    a result, the carry ripples up from the lsb to the msb, but still
    the sum for a bit can't be computed before the lower bit's carry
    is computed.

  Carry Kill/Generate/Propagate Adder (Manchester Adder):
    Similar to CRA.  For each bit, the carry out is computed to be either
    always 0 (kill), always 1 (generate), or whatever the carry in is
    (propagate).  This is statistically faster than the ripple adder
    since kill/generate signals can start in the middle bits instead
    of always at the lsb.  Although the worst case is the same as the CRA,
    when implemented in silicon it's sometimes faster since pass-gate
    logic for the propagate can often be faster than going through
    all the carry logic from the lower bits...

  Carry Lookahead Adder:
    Instead of waiting for the carry bit to propagate up, extra logic
    (lookahead logic) is put in to compute each of the carry bits (in
    parallel) by looking at all the lower bits.  This is almost always
    faster than a CRA or MA since the logic to compute a carry for a
    specific bit is faster than computing the intermediate carry bits
    and waiting for the result.

  Carry Save Adder:
    A carry save adder is a different thing all together.  Instead of
    trying to solve the addition problem, it solves a different problem.
    All a CSA does is converts the problem of adding three numbers
    together into a problem of adding two numbers together.  If you
    want to add 9 numbers together, you can use 3 CSAs to reduce it 6
    numbers; and then reduce 6 numbers to 4 numbers. The advantage of
    a CSA is that it's fast (no carries, since it SAVEs them for later).
    Multipliers and DSP accumulators tend to do CSAs so they save all
    the carries from all the adds to the last stage and do one CLA at
    the end...

    To see how it works, let's look at the truth table for adding 3 bits
    together...

                                x +y +z  =s +2c
                                0  0  0   0  0
                                0  0  1   1  0
                                0  1  0   1  0
                                0  1  1   0  1
                                1  0  0   1  0
                                1  0  1   0  1
                                1  1  0   0  1
                                1  1  1   1  1

  (notice you can never get a 'double' carry)

  You can treat all the bits put together {s[i]} as a number and {c[i]}
  as another number so you've just converted the problem of computing
  X+Y+Z to S+2C without waiting for any carries...

A CSA is a basic example of a computation technique called redundant digit
representation of which there have been countless papers...

The basic motivation for redundant digit representation is:

    1. Computation is often easier in different representations of a
       number (that are not compact).

    2. Using binary representation for intermediate results requires extra
       logic to make the representation compact.

Also, if we aren't going to look at the intermediate results anyways, why
bother to convert them to binary?  Save the logic (which slows things down
and makes it bigger) and just convert it back once to binary at the end.

    - S.D. Lew

         ----    ----    ----    ----    ----    ----   ----

From: Phil Carmody <carmody@cpd.ntc.nokia.com>

Thanks, this has been most informative, there were probably two I'd never
heard of before.  Two follow-up points.

When an old old delerious work-mate was teaching me about CSA's several
years back, he aberrantly remembered the acronym standing for 'Carry Shift
Adder', which he explained made perfect sense as the carry bits are shifted
as they are reinserted into the sum later.

Incidentally, is it true of not that the prevalence of the MAC/MADD
(multiply-accumulate or multiply-and-add) instructions is because
multipliers have traditionally been implimented using CSA's, and that the
very first stage has a 'spare' input? So one might as well get an extra add
in there for free. (so not 32 but 33 sums ->22->16->12->8->6->...)

Has anyone ever considered higher order multiplexers, ones which take, say,
7 inputs and output 3?  I (from ignorance) assume these would be as easy to
cascade as traditional CSA's.  What would the tranny count be for such an
operation be, relative to a normal CSA? (33->15->7->3 so three stages gets
further than 6 of the above).  If they are not used - why not, what's the
drawback?  As ever, eager to learn...

    - Phil Carmody
      Nokia Telecommunications                     Cambridge, UK

         ----    ----    ----    ----    ----    ----   ----

From: S.D. Lew <s_d_lew@my-deja.com>

Just to be extra confusing, there's yet another type of adder called a carry
"select" adder (which is sort of a generalized manchester adder popular in
FPGA implementations).  This makes a C{Save|Shift|Select}A ambigous...

> Incidentally, is it true of not that the prevalence of the MAC/MADD
> (multiply-accumulate or multiply-and-add) instructions is because
> multipliers have traditionally been implimented using CSA's, and that the
> very first stage has a 'spare' input? So one might as well get an extra
> add in there for free. (so not 32 but 33 sums ->22->16->12->8->6->...)

CSA (the save/shift kind) is probably a popular implementation, however,
since most DSP MAC unit have the accumulator at much higher precision (more
bits), it sometimes makes more sense to insert the accumulation near the
end of the tree rather than the beginning... (unless the multiplier is
internally rounded).

In some implementations the accumulator is never really there until you read
it.  The last two carry save numbers are never added together before being
fed back into the adder tree so you don't need the CLA until you actually
read the accumulator to do something else (or at least until later in the
pipeline)...  Sometimes it's really hard to know what tricks go on inside
those DSP blocks ;-)

> Has anyone ever considered higher order multiplexers, ones which take,
> say, 7 inputs and output 3? I (from ignorance) assume these would be as
> easy to cascade as traditional CSA's.  What would the tranny count be for
> such an operation be, relative to a normal CSA?  (33->15->7->3 so three
> stages gets further than 6 of the above).  If they are not used - why
> not, what's the drawback?

The biggest draw back of CSAs is routing (you have to get 3 numbers in on
the right and only 2 come out of the left and you have to bring another one
in from another branch to do the next stage).  7 to 3 would just make this
routing situation worse.  However having said that, I know of a hand
designed carry save accumulator that added more than 3 bits together in a
stage that was more compact than the standard CSA implementation, but the
cells were custom cells not standard ASIC/FPGA library cells...

    - S.D. Lew

         ----    ----    ----    ----    ----    ----   ----

> Carry Save Adder:
>   A carry save adder is a different thing all together.  Instead of
>   trying to solve the addition problem, it solves a different problem.
>   All a CSA does is converts the problem of adding three numbers
>   together into a problem of adding two numbers together.  If you
>   want to add 9 numbers together, you can use 3 CSAs to reduce it 6
>   numbers; and then reduce 6 numbers to 4 numbers. The advantage of
>   a CSA is that it's fast (no carries, since it SAVEs them for later).
>   Multipliers and DSP accumulators tend to do CSAs so they save all
>   the carries from all the adds to the last stage and do one CLA at
>   the end...


From: Terje Mathisen <Terje.Mathisen@hda.hydro.com>

This is exactly how you'll have to implement fast bignum (arbitrary
precision) code on IA64 (Itanium?), by using a pair of register arrays
to hold intermediate data in carry save format.

By doing this you can delay the final carry ripple operation until the
end, which results in greatly improved performance.  (I'd guess something
like 3-5 times faster for a bignum multiply operation, which is still
small enough that it doesn't make sense to use FFT or other tricks.)

    - Terje Mathisen

         ----    ----    ----    ----    ----    ----   ----

From: Ray Andraka <randraka@ids.net>

There is a little bit of discussion about this on the multiplier page in my
website.  In FPGAs, the dedicated ripple carry is so much faster than the
general routing resources that a ripple carry adder will generally
outperform carry save and carry look-ahead architectures.

    - Ray Andraka
      Andraka Consulting               http://users.ids.net/~randraka

         ----    ----    ----    ----    ----    ----   ----

> What are the specifications of a "Carry-Look-Ahead-Adder"?  Is this the
> same as a "Carry-Save-Adder"?

From: gah@ugcs.caltech.edu (Glen Herrmannsfeldt)

They are not very similar.  The simplest adder adds the way you do on paper,
one digit at a time and then onto the next digit.  This is slow.

Carry lookahead decides in advance whether there will be a carry, without
waiting.  It takes more logic, but not that much more, and is a lot faster.

Carry Save Adder is used when adding more than two numbers.  You add the
numbers bit by bit and generte a sum (that doesn't include carry) and the
carries to be added in later.  If you are adding many numbers at once, you
can wait to combine the carries until near the end.

    - Glen Herrmannsfeldt
      California Institute of Technology           Pasadena, CA

         ----    ----    ----    ----    ----    ----   ----

From: George Russell <ger@informatik.uni-bremen.de>

One really can't leave the subject of carry technology without mentioning
the "anticipating carriage" mechanism Charles Babbage planned for the
Analytical Engine.  Unfortunately I can't find a picture or details on the
Internet (I've come across it several times in books) but you could try 
poking around http://www.fourmilab.ch/babbage/contents.html  I suppose the
anticipating carriage mechanism is closest to a "Carry-Look-Ahead-Adder".

    - George Russell
      Universitaet Bremen                      Bremen, Germany

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)