Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS



  From: Anders Nordstrom <andersn@sympatico.ca>

  Hi, John,

  Anna and I are pleased to announce the arrival of our baby boy!  We have
  now officially joined the world of the sleep deprived.  He decided to
  join us about two weeks early on at 3:20 a.m. July 27th.  It all took
  place after a fast labour of 5 hours.  Baby, mom and dad are doing great
  and are now settling into new routines at home.    He's 3,435 g & 53 cm.
  (In American, that's 7 lbs, 9 oz. & 21 inches.)  Our first task is to
  find the manual.  I refuse to believe that they would ship such a complex
  product without a manual.  :-)

      - Anders Nordstrom


( ESNUG 398 Subjects ) ------------------------------------------- [07/31/02]

 Item  1: ( SNUG 02 #19 ) A User Review Of Hidden Dragon / Floorplan Compiler
 Item  2: How Thermally/Vibrationally/Soldering Reliable Is BGA Packaging?
 Item  3: ( ESNUG 395 #5 ) Should Layout Or DC Buffer Up 40 Fanout Nets?
 Item  4: ( ESNUG 395 #10 ) Here's The Cadence Tool Flows That Users Use Now
 Item  5: I Found Different Verilog Read Interpretations With DC 2001.08-SP2
 Item  6: How Monterey Aristo IC Wizard Estimates Initial SoC Area Budgets
 Item  7: ( ESNUG 396 #3 ) Avanti Jupiter/Apollo Rectilinear Block Flows
 Item  8: ( ESNUG 397 #5 ) PhysOpt DEF Output Missing SHAPE Properties, Too!
 Item  9: ( ESNUG 396 #4 ) Synopsys Slow Getting Floorplan Compiler To Users
 Item 10: ( ESNUG 396 #5 ) PhysOpt 2.5-D Extraction Isn't Exactly Here Yet
 Item 11: These Fruity Flavors Of C Doesn't Mean Verilog's PLI Is Going Away
 Item 12: User Confusion On Units For Library Derived PhysOpt R and C Values

 The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com


( ESNUG 398 Item 1 ) --------------------------------------------- [07/31/02]

Subject: ( SNUG 02 #19 ) A User Review Of Hidden Dragon / Floorplan Compiler

> Other than FPGAs, most of these wounded/dying Synopsys tools are playing
> in under $10 million markets.  That is, they're mostly expendable
> experiments.  If they succeed, great.  If they fail, kill'm.  Let the
> market decide.  The only baby tool they can't ignore, though, is Hidden
> Dragon.  In terms of future tools, Hidden Dragon (or something like it)
> must work.
>
>     - from the SNUG'02 Trip Report


From: Valentina Baiardo <valentina.baiardo@st.com>

Hi John,

We just did an eval of the new Synopsys Floorplan Compiler (FPC) that your
readers may know as Hidden Dragon.  Our goal was to find out if it was
better than our standard Chip Architect/PhysOpt flow.  We were mostly
interested in its virtual flat approach to see if it could reduce the
amount of time required to create the final floorplan.  Of particular
interest is how it handles hierarchy definition and block planning.  Our
second objective was to evaluate its ability to analyze the power grid
at floorplan level.

To perform the evaluation we used two hierarchical designs.

Design 'Asterix' was a 1 M instance (~4 M gate) 0.18 um design with 10 top
level soft blocks and 16 embedded hard macros.  At the top level, in
addition to the soft blocks it had over 3 K leaf cells and ~10K nets.
Asterix' highest system clock is 80 MHz. 

We ran Floorplan Compiler virtual flat floorplanning on the Asterix gate
level netlist with cluster placement on all the soft blocks.  It prooved
a very fast cycle time, half a day to obtain the top level floorplan with
results compatible with the final floorplan of the production design. 
Compare this with the one week necessary for the original chip designer
to complete the floorplanning after the IO placement was frozen.

Design 'Obelix' was a 1.4 M instance 0.13 um design (~5.6 M gates) with 7
top level soft blocks, 35 K leaf cells and more than 230 embedded hard
macros.  The top level of Obelix had more than 50K nets.  It's highest
system frequency was 15O Mhz with the I/O frequency exceeding 600 MHz.

We used Floorplan Compiler in the real implementation flow to identify the
optimal top-level partitioning starting from an incomplete netlist where
a block (15% of the Obelix netlist) was missing. 

While Asterix was done using the recommended top down virtual flat approach,
with Obelix we needed to apply both the top down and bottom up approach.  We
created a bottom up floorplan for the most critical block whose frozen
floorplan was loaded using reload_subdesign.  We set set_don't cluster to
true and set_shape constraints to rigid on this block and again ran cluster
placement on the full Obelix design.  A keepout to reserve placement area
was added to mimic the presence of the missing block.

The floorplan completed with a very fast cycle time (10 minutes for cluster
creation and half an hour for cluster placement).  The top-level
architecture was well understood and the blocks identified were successfully
closed in term of timing and routability in our Apollo back-end flow.

We experienced some issues in cluster legalization and floorplan creation
which are currently being worked on by the Synopsys product team.  These
forced us to do minor manual optimizations to the floorplan to meet our
routability goals but this additional effort was only a few hours.
 
The secondary objective of our Floorplan Compiler eval was to test its newly
developed embedded power network analysis (PNA) capability.  It promised
some interesting "what if" analysis capabilities.  In addition it claimed to
have similar quality and correlated results with our signoff power analysis
tool, Simplex VoltageStorm GDSII.

The design we selected to test PNA on had 86 hard blocks, 1 IP core and 350 K
leaf cell instances.  It only took 5 minutes for extraction and power network
analysis.  The preliminary comparison showed a relative good match to Simplex
in the IR drop distribution.  The colour maps generated looked similar to
VoltageStorm's analysis, with the observable differences accounted for in
the absolute value due to the missing IP block in the PNA data base.

Our eval has convinced us that Floorplan Compiler's virtual flat approach is
a significant value added to our design flow.  As for bugs, the tool appears
mostly stable.  Its hierarchy management and block planning features can
significantly improve design productivity and we found it reducing
floorplanning cycles from weeks to days.  Our target for PNA adoption is
early Q4 2002.

    - Valentina Baiardo
      STMicroelectronics                         Agrate, Italy


( ESNUG 398 Item 2 ) --------------------------------------------- [07/31/02]

From: Steve Gross <smgross@umich.edu>
Subject: How Thermally/Vibrationally/Soldering Reliable Is BGA Packaging?

Hi John,

I've got some questions on BGA reliability.

I work for the University of Michigan, I'm making a microwave radiometer for
use on an aircraft (a DC-8).  The digital back-end processing is being done
in an FPGA.  It looks like I need somewhere in the vicinity of 500 - 750 K
gates (and let's not get sidetracked on what a "gate" means in an FPGA...)
to implement the several hundred correlators and related control and data
readout logic.

There's no problem getting FPGA's of this capacity, but AFAIK they're only
available in BGA packaging.  I've never progressed beyond 240-pin QFP
packaging.  Do any ESNUGers have any reports on BGA reliability,
specifically in an environment which will experience thermal swings (ambient
temp can vary from about +50 to -60 C over the course of about 30 minutes,
although we have active thermal control to significantly reduce that range
within the system's enclosure... this is necessary to make the radiometer
data usable) and vibration (see "aircraft" above)?

A second issue is that we don't have BGA soldering and inspection capability
in-house.  I can either send my board out to have the BGA directly mounted,
or I have found a vendor who can mount the BGA's on a carrier (through-hole
or SMT) that I can work with in-house.  This latter vendor knows mil-spec
issues and can provide x-ray inspection of all joints, etc.  A third option
is a socket from a company such as Emulation Technology.  Any good or bad
experience reports with any of these techniques (preferably with pointers to
vendors) would be appreciated.

My alternative is to partition the design, but (of course) my schedule is
tight and I would like to be able to finalize the FPGA design in parallel
with board fab.  I'd rather not have the additional complication of a
partition to deal with.

Thanks!

    - Steve Gross
      University of Michigan


( ESNUG 398 Item 3 ) --------------------------------------------- [07/31/02]

Subject: ( ESNUG 395 #5 ) Should Layout Or DC Buffer Up 40 Fanout Nets?

> I'm looking for a guideline and/or experiences for large blocks or small
> chips on when to allow Design Compiler to buffer high fanout nets, versus
> having your physical tool insert a buffer tree later in the design flow.
>
> In other words, if a net has 40 loads, should the Design Compiler be
> allowed to buffer the net, or should it be classed as an ideal_net?  What
> about 100 loads?  200 loads?  Obviously these aren't clocks, nor a master
> scan test enable that goes to every flop.  These are nets in the grey area
> in between.  We've had some congestion problems with an internal reset
> that had a large fanout.  It was an easy fix to have layout insert a
> buffer tree, but the question came back as to what threshold should
> be used.
>
>     - Wayne Miller
>       Standard Microsystems Corporation


From: John Phillips <jphillip@matrox.com>

Hi, John,

If you do not allow DC to buffer, you will get a large delay through the
driver.  If you declare the net an ideal net in DC, you will get an
unrealistically small delay.  Either way, you will not get a good estimate
of your post-route timing.  I'm not sure if you are using PhysOpt, but it
should be free to manipulate the buffer trees as long as the constraints
are met.

    - John Phillips
      Matrox Tech, Inc.                          Boca Raton, FL

         ----    ----    ----    ----    ----    ----   ----

From: Emre Tuncer <emre@mondes.com>

Hi John,

Placement engines find it difficult to deal with high fanout nets and
usually they ignore nets with fanouts larger than a certain threshold.

If you buffer high fanout nets too early, during synthesis, these nets will
be divided into smaller nets.  The resultant clustering of leaf cells may
not be optimal in terms of placement and cause unneccessary congestion.  On
the other hand, if you wait until after placement, and then buffer the high
fanout nets, adding all the buffers may not be feasible since the placement
engine ignored them in the first place, and may cause congestion hot spots.
Considering these nets as clocks is overkill and might make things worse.

What we have found is that the best time to address this problem is during
the physical prototyping stage, where the placement is coarse enough that
added buffers can be eased into the design, and the location of leaf cells
are known to a sufficient level of precision.  Our Sonar tool based on this
approach.

To answer Wayne's question, probably the best bet would be to find out what
is the threshold of their placer for ignoring high fanout nets and use it
as a guideline.

    - Emre Tuncer
      Monterey Design Systems                    Sunnyvale, CA

         ----    ----    ----    ----    ----    ----   ----

From: John McGehee <johnm@voomtown.com>

Hi, John,

Please tell Wayne that Design Compiler (DC) will not do a good job of
inserting buffer trees because it can only randomly connect the buffer trees
based on wireloads and fanout, without regard for where the cells are
placed.  PhysOpt is aware of the placement, so it can be trusted to insert
buffer trees.

If you are using Apollo/Saturn, Saturn can take care of your ~40 fanout
nets.  Big nets like reset and scan enable should be done with clock
tree synthesis (CTS).  Do CTS on the clocks last.

Astro Pre-Placement Optimization will automatically take care of all
your high-fanout nets.  It does a good job, but it needs to run faster.
Use version 2001.2.3.5.0.2.2 or above.  Earlier versions cannot handle
large nets like reset and scan enable.

    - John McGehee
      Voom, Inc.                                 Los Altos, CA

         ----    ----    ----    ----    ----    ----   ----

From: Jon Harris <jharris@siroyan.com>

Hi, John,

With the introduction of automatic buffer tree insertion as part of the
basic "compile" command nowadays, DC does a good job of handling high
fanout nets automatically.  For all of Siroyan's test chips I have used
DC to buffer every high fanout net except scan enable and obviously the
clock nets.

Having said that you need to consider that when using DC for high fanout
net buffering it has no concept of where the cells in the high-fanout
cone shall be placed and therefore how much wire load shall be present.
DC can only estimate how much load will be present based upon the
wireload model being used.  Taking the example of a high fanout signal
of 100 loads, the pin capacitances of the loads and the wireload model
might only mandate the use of 2 or 3 buffers.  However, you will find
that after placement the cells on your high fanout net have been placed
over a wide area and these 2 or 3 buffers that DC inserted are now
completely inadequate.  The extra loading will degrade the transition
times and delays through the buffers massively.  Your post-layout
optimisation tool will attempt to fix this, but quite possibly there may
not be appropriate placement area for additional buffers and timing
shall not be met.

So, in order to provide protection against this effect you can make use
of the DC command "set_default_fanout_load."  This can be used to set
the maximum "fanout_load" that DC can tolerate on any net.

         eg.  dc_shell-t> set_default_fanout_load  15

Your target technology library will have a "fanout_load" attribute
applied to every input pin of each standard cell.  Normally this shall
default to 1, and in this case the effect of setting the default
fanout_load to 15 means that DC shall ensure that the maximum fanout for
any net in the design shall be no more than 15.  So with the maximum
fanout reduced, for a net that fans out to 200-300 destinations, DC
shall be forced to put in a buffer tree of 2 or 3 levels, comprised of
several dozen buffer cells.  These buffers are then available during
placement to span the area occupied by the destination cells and there
is less likelihood that significant timing degradation will be seen in
layout.

Obviously, whatever post-layout optimisation tool you use to do in-place
optimisations and buffer re-sizing will still have a little work to do
to correct for wireload inaccuracies.  However, the work already done by
DC on the high fanout nets shall ensure that these nets will require
only the same attention as the rest of the nets in the design.

As an aside, while I mention fanout_load, something else that Wayne
might like to consider is to increase the fanout_load on the input pins
to any memory macrocells to match the set_default_fanout_load value. 
Consider a case where you have an address bus that is driving 8 RAMs,
which physically may be far apart.  By default DC will synthesize one
buffer per address bit, with a fanout of 8.  Once the netlist has been
placed, this one buffer can easily find itself driving 8 tracks halfway
across the chip, potentially in different directions and badly in need
of post-layout optimisation.  However, if the fanout_load setting on the
RAM inputs is set to match the "set_default_fanout_load" value, DC will
synthesize one buffer per address bit for *EACH* RAM, which the
placement engine can then place as required to minimise delay.

An example segment of a RAM .lib file showing the fanout_load field is
given below :

  bus(A)  {
          bus_type : R64X22BNTBM4B1_ADDRESS;
          fanout_load :  15;                   <- New fanout_load value
          direction : input;
          capacitance : 0.052;
          timing() {
                related_pin     : "CLK"
                timing_type     : setup_rising ;

                ...

The .lib file can be edited as shown and recompiled using just 2 commands:

     dc_shell-t> read_lib  my_ram.lib
     dc_shell-t> write_lib my_ram -format db -output my_ram.db

DC may complain that there is no library compiler license, but as we haven't
changed the functionality of the RAM this error can be ignored and the new
my_ram.db file is successfully written out anyway.

As a final note, I have not yet tried leaving the scan enable buffering to
DC as this could fan out to tens or hundreds of thousands of flops -- and I
just don't feel that lucky!  

    - Jon Harris
      Siroyan                                    Reading, Berkshire, UK

         ----    ----    ----    ----    ----    ----   ----

From: Tom Tessier <tomt@hdl-design.com>

Hi John,

My answer to Wayne's question is "it depends."  If the 40+ loads are in the
same hierarchical block in a floorplan, them most often let DC do it.  If
the 40+ loads are a control signal which has to go all over the die in one
clock cycle then experiment with both DC and the P&R tool.

Experience depends upon the tools (about to start a flame war, please don't
shoot the messenger).  I had a design that I took into Avanti Apollo that
I knew had large loads (both fanout and large capacitance due to wire
length).  The design was done in the physical domain in a hierarchical
fashion because we couldn't get Avanti Apollo to converge when doing the
design flat (lots of RAM based blockages caused problems).  We let Apollo do
the buffer insertion and placement.  Most often it decided correctly that it
needed 2-3 buffers to get the signal across the die in the timeframe we gave
it.  Most often it placed them very poorly.  For example it would place one
buffer right at the pin of the sending hierarchical block, then the next two
close to the receiving pin of the hierarchical block.  This didn't meet the
timing as I still had 6 mm+ of wire between the buffers.  We ended up
putting 3000+ buffers in by hand, and generating PDEF placement to force the
tool to put them where we wanted (check out my San Jose SNUG 2001 Papers for
the gory details).

Same basic design 1 year later with a new foundry.  Took out all the 3000+
buffers. This foundry used Cadence and decided they could get it done flat.
They solved the large load problem and large fanout problem without our
intervention.  Provided them an SDC file and they solved the rest.  Did the
flat layout help them?  I think it did but it also complicated the issue as
the tool had a lot more data to work with.  Was the designer on the second
go around better adapt at using the tool?  Possibly.  ;-)  Was the client I
was working for very happy with the second encounter?  You bet, as they saw
very minor timing closure issues.  In fact the final timing closure issue
had to do with a problem that Paul Zimmer mentioned in ESNUG 393 #2, this
foundry wanted up to 16% on-chip variance for signoff.  That is a healthy
chunk of the clock cycle to give up.  Once source synchronous interface had
problems with this effect, but the team worked around it until it they got
timing that was acceptable.

So unfortunately it really depends upon experience.

    - Tom Tessier
      t2design, Inc.                             Louisville, CO

         ----    ----    ----    ----    ----    ----   ----

From: Srinivas Kakumanu <kakumanu@time2mkt.com>

Hi John,

I've had an experience on one of my recent hierarchical chips in 0.13 um
where all the blocks in the chip were taken to layout even with nets having
fanout more than 100.  These nets are neither clock nets nor reset nets.

To give you an example, some of the nets are like a write enable to a set of
64-bit register where this write_en would be going to all the 64 flops.  The
reason behind doing this was that the layout tools have got a very intensive
algorithm to buffer out high fanout nets (hfns) and this algorithm works to
build a balanced buffer tree and these buffers are inserted considering the
ACTUAL loads which we will be lacking at DC synthesis stage. 

And we have experimented this aproach on bigger blocks and allowing layout
tools to do buffering on hfns nets resulted in less congestion, timing
problems than starting the block which is already buffered using DC at
synthesis stage itself.

    - Srinivas Kakumanu
      time2mkt.com

         ----    ----    ----    ----    ----    ----   ----

From: Lars Rzymianowicz <larsrzy@ti.uni-mannheim.de>

Hi John,

There's a Boston SNUG paper about this issue by Rick Furtner of TenSilica.

  "High Fanout Without High Stress: Synthesis & Optimization of High-fanout
   Nets Using Design Compiler 2000.11"

Basically, I'd recommend to use a threshold of 100, maybe 50.  Layout tools
are much better at building balanced trees than logic synthesis tools like
DC or Ambit.  One might also limit the fanout (set_max_fanout) of nets below
this threshold to 8.  I had some good results with it, since it forced DC to
buffer nets with 8-100 fanout. That eased the later P&R job a lot.

    - Lars Rzymianowicz
      University of Mannheim                     Germany


( ESNUG 398 Item 4 ) --------------------------------------------- [07/31/02]

Subject: ( ESNUG 395 #10 ) Here's The Cadence Tool Flows That Users Use Now

> I'm trying to get a complete Cadence-based tool flow up and I'm confused
> by Cadence's product line.  In particular I'm referring to:
>
>     Preview Silicon Ensemble
>     DSM Silicon Ensemble
>     PKS (aka Silicon Ensemble PKS)
>     First Encounter
>     SOC Encouter
>
> We don't have First Encounter, but we do have PKS 4.0, DSM SE 5.3, and
> Preview (IC446).
>
> All these tools have substantial overlap.  It seems like Cadence is moving
> towards First Encounter and PKS as the complete solution.  That's great for
> the future, but what about now?  BTW, we use multiple power supplies on the
> chip (with inherited connections in Composer).  How is the power
> connectivity information conveyed between this new mix of Cadence tools?
>
>     - Albert Ma
>       M.I.T.                                     Cambridge, MA


From: Geoff Smith <gjsmith@cisco.com>

Hi, John,

Our working Cadence/Synopsys/Mentor flow is:

   Virtuoso (IC446) (analog layout -> macrocells)
   Ambit/Synopsys -> (rtl-to-gates)  (Artisan libraries)
   LogicVision (membist, scan, icBIST, bscan)
   Silicon Ensemble (detailed floorplanning)
   PKS (physical placement)
   CTPKS (clock tree generation)
   Silicon Ensemble (detailed route and 2.5D extract)
   CeltIC - signal integrity
   Virtuoso (IC446) for gds merge and manual fixes
   Assura/Calibre for LVS/DRC

First Encounter is currently under eval for hierarchically partitioning
large designs (i.e. a better floorplanner).

    - Geoff Smith
      Cisco Systems                              Toowong, Australia

         ----    ----    ----    ----    ----    ----   ----

From: [ Barney, the Big Purple Dinosaur ]

Hi, John,

Albert has too many questions, which would need too much time to answer
them exactly.

  1. FE & PKS IS NOW.  We are already using it.

  2. Take FE for the floorplan, power routing, initial placement, IPO
     timing analysis, hierarchical partitioning (if applicable.)

     BTW, SoC Encounter includes FE, PKS, CTS, CeltIC and SE-Ultra
     router, which will be later replaced by Plato.

     It is easy to use, all the technology and design data are in ASCII.
     We like the hierarchical floorplanner and power routing.  Also
     initial placement is very good.  No problem with a mixture of
     custom blocks and SCells, we are doing also multiple power domains.

  3. You have not mentioned all relevant tools which you would need to
     complete your flow.  On the other hand it is not clear what kind
     of design (technology, complexity) are you going to do.  Unfortunately
     the problems are always hidden in a detailed application of the flow.

Best regards,

    - [ Barney, the Big Purple Dinosaur ]

         ----    ----    ----    ----    ----    ----   ----

From: Rajesh Pathak <rpathak@cadence.com>

Hi John,

Any overlap between our tools is incidental as some tools were as a result
of aquisition.  Suffice it to say that SOC Encounter is a grand integration
of SE-PKS, First Encounter (FE) and signal integrity tools like Celtic,
Simplex etc.  FE is a silicon prototyping tool useful in a SOC environment
for legal placement of disperate blocks like IP, analog, memory, etc.  This
legal placement frees a designer to concentrate on block level hardening
of these blocks from a chip level perspective.  The global (chip level)
information like feed-thrus I/O constraints, placement obstructions like
global buffers etc are passed to the block level so that the blocks are
hardened in the context of top-level.  At the block level, either SE-PKS or
FE is used to complete the block level P&R.

Since Albert is primarily interested in a flat design an appropriate tool
would be SE-PKS which he indicated he already has access to (PKS 4.0 and
Silicon Ensemble).  SE-PKS is our tool of choice for timing critical
and congested designs.

One thing I am curious about is what does he mean by multiple voltages being
present in the design?  Is he referring to voltage islands or that he has
library cells with multiple voltage rails?  In both the cases I am afraid he
would be hitting a brick wall.  Library formats like ALF or .lib does not
support multiple power supplies.  One can fool the front-end engines like 
static timing by manually editing the .lib file to reflect only one voltage
although the timing arcs have actually been characterized with more than one
(and different) voltage rails.  However one would not find such luck with
back-end engines.  OLA (open library acess) has been working on supporting
multiple voltages in the libray formats, but it is not out yet.  FYI, PKS
will be supporting OLA in future.

Assuming he has one voltage rail (for the core) the methodology would be
fairly straight-forward.  First he would have to extract a LEF from his
custom datapath block gdsII using Picasso/Abgen.  Create a STAMP model or
TLF for the same block.  He can use Pearl with the BuildTimingModel command
to create the TLF format.  Once the physical abstract (LEF file) and timing
model (TLF or STAMP) has been created for this block, the block can then be
used as library macro in the flow.  The flow sequence would be:

   # Read Library 
   read_alf library.alf
   read_tlf datapath.tlf
   read_lef library.lef
   read_lef_update datapath.lef

   # Read top level RTL
   read_ver {design.v ......}
   # Elaborate
   do_build_generic
   # read synthesis constraints
   source constraints.tcl
   # create gate-level netlist
   do_optimize 

   # read floorplan DEF. created using PKS or SE.
   # This DEF has std cell rows and placed datapath macro (auto or
   # manual)+IOs, core and block rings
   read_def floorplan.def
   # place std cells
   do_place -timing_driven true
   # Insert physical clock tree using a clock tree constraints file
   source cts.tcl
   do_build_clock_tree 
   # Run Timing
   report_timing
   do_xform_optimize_slack
   do_place -timing_driven true

   # Use SE to connect rings and follow pins for VDD and GND nets
   # Global Route
   do_route

   # Repeat the following steps until timing convergence
   do_xform_tcorr_ipo
   # Final route
   do_wroute
   # extraction
   do_hyperextract
   # read extraction
   read_spef
   # timing analysis
   report_timing

Go to back up to the step at "do_xform_tcorr_ipo" and repeat if your
timing is not met.  Hope this helps.

    - Rajesh Pathak
      Cadence Design Systems                     Houston, TX


( ESNUG 398 Item 5 ) --------------------------------------------- [07/31/02]

From: Lydia Lee <llee@esilicon.com>
Subject: I Found Different Verilog Read Interpretations With DC 2001.08-SP2

Hi John,

I have a huge netlist, so I used "read_file -f verilog -netlist" to read in
dc_shell-t.  I used DC 2001.08, 2001.08-SP1 without any problems.  I just
switched to DC 2001.08-SP2 recently, and found an undocument feature/bug.

   module top (...
    inst_A U100 ( .delayout({aout, bout[1], bout[0],
       cout[12], cout[11], cout[10], cout[9], cout[8]}), ...
    inst_A U101 ( .delayout({dout[15:8]), ...
   endmodule

With DC 2001.08-SP2, I ran the following 2 experiments.

Experiment 1:

    read_verilog test.v
    current_design top
    link
    all_connected U100/delayout[7]   which gives "aout"   - correct
    all_connected U101/delayout[7]   which gives "dout[15]"   - correct

Experiment 2:

    read_file -f verilog -netlist test.v
    current_design top
    link
    all_connected U100/delayout[7]  -> "cout[8]"  - incorrect should be aout
    all_connected U101/delayout[7]  -> "dout[15]"  - correct

I used the same netlist, same commands on DC 2001.08, 2001.08-SP1 with their
64-bit, Linux and SparcOS5 version.  All of them give me correct result with
their netlist reader "read_file -f verilog -netlist".  Only DC 2001.08-SP2
gives me incorrect behaviour.  Please warn your readers about that.  I filed
a support call to Synopsys as well.

    - Lydia Lee
      eSilicon, Inc.


( ESNUG 398 Item 6 ) --------------------------------------------- [07/31/02]

From: Gnana Kanagaratnam <gnana@mondes.com>
Subject: How Monterey Aristo IC Wizard Estimates Initial SoC Area Budgets

Hi John,

I'm an applications engineer supporting IC Wizard at Monterey.  Over the
past year, I've received a lot of questions about how IC Wizard calculates
initial area budgets for SOC designs.  In designs dominated by standard
cells, this is very straightforward -- just sum up the total cell area and
multiply by your favorite utilization factor.  However, this approach does
not work well in designs dominated by hard blocks.  IC Wizard allows you
specify different utilization values for standard cells & hard blocks.

The user defines standard cell utilization and hard macro utilization with
the following commands:

       set context -blockset <design_name> -version <version_number>
       run updateBlockSize -standardcellutil 80 -hardblockutil 90

In this example, IC Wizard will calculate the total area occupied by
standard cells and then add extra area such that the utilization equals 80%.
Similarly, it will sum up the total area occupied by hard blocks and add
extra area such that the utilization is 90%.  For designs that contain both
standard cells and hard blocks, both utilization values are used to
calculate the total area of the design.

For designs containing multiple levels of hierarchy, IC Wizard will traverse
the hierarchy down to the lowest level, then apply these utilization values
and calculate a bottom-up area estimate for the hierarchical block at the
top level.  Standard cells and hard blocks have library attributes that
IC Wizard uses to determine which utilization value to apply.  Standard
cells are of class CORE and hard blocks are of class RING, BLOCK, COVER,
or ENDCAP.

Standard cell utilization value is affected by your synthesis methodology.
If you use conservative timing models, then the initial area estimates out
of DC will be (more) pessimistic, thus the cell utilization factor can be
more aggressive.  Conversely with an optimistic timing model, you should use
more conservative cell utilization.

Coming up with good default hard macro utilization is a little tricky.  This
is affected by many factors including, does the design contain many dual
ported memories, how neatly the hard blocks tile, pin density, do the blocks
vary in size dramatically.

    - Gnana Kanagaratnam
      Monterey Design Systems                    Sunnyvale, CA


( ESNUG 398 Item 7 ) --------------------------------------------- [07/31/02]

Subject: ( ESNUG 396 #3 ) Avanti Jupiter/Apollo Rectilinear Block Flows

> I am wondering if your readers have experience in handling rectilinear
> blocks, especially with Synopsys (PhysOpt)/ Avanti (Jupiter/Apollo) tools.
>
>  1. How difficult is it in getting the pins assigned when the number of
>     edges exceeds 4?
>  2. Is power routing capable of dropping straps of different lengths
>     because of different dimensions in one direction or do I have to
>     manually alter the lengths?
>  3. Will writing out the GDSII have any problems?
>  4. Can the parasitic extractor handle arbitrary shaped blocks?
>
> Plus is there anything else that would make my life miserable here?
>
>    - Jay Pragasam
>      Brecis Communications                       San Jose, CA


From: Caesar Abedin <cabedin@amcc.com>

Hi John,

I have used Jupiter/Apollo (don't know about PhysOpt) for rectilinear blocks
in the past.  In fact, in my current design, I have a block with 10 sides.
(I had to cut around some analog macros.)  To answer your concerns...

  1) Pin Assignment: Jupiter did a pretty good job in placing the pins where
     it should have; though Jupiter's run time was longer than I expected.

  2) Power Routing: We pushed down our PG grid from the top using Jupiter's
     cute preroute functions and there weren't any major issues.

  3) GDSII: No problems.

  4) Parasitics: No problems.

One problem that you might be concerned about is the routability of your
rectilinear blocks.  I had to change 2 blocks from L-shaped to rectangular
and take the hit on die size simply because the rectilinear blocks were
unroutable.

    - Caesar Abedin
      AMCC                                       Andover, MA


( ESNUG 398 Item 8 ) --------------------------------------------- [07/31/02]

Subject: ( ESNUG 397 #5 ) PhysOpt DEF Output Missing SHAPE Properties, Too!

> PhysOpt doesn't write out the PROPERTYDEFINITIONS section in a DEF file
> correctly.  It does not add a space before the semicolon at the end of
> each of the following lines.  Silicon Ensemble can not parse this DEF
> without a space between each token.  It flags an error. 
>
>     NET ROOT_ORIGINAL_NAME STRING;
>     COMPONENTPIN CLOCKROOT STRING;
>     NET CLOCKROOT STRING;
>
> which should be:
>
>     NET  ROOT_ORIGINAL_NAME  STRING   ;
>     COMPONENTPIN  CLOCKROOT  STRING   ;
>     NET  CLOCKROOT  STRING   ;
>
> This was seen while reading DEF from PhysOpt into Silicon Ensemble.
>
>     - John Cooley
>       the ESNUG guy


From: Noa Safra <rm10192@email.sps.mot.com>

Hi John,

You mentioned in the ESNUG 397 #5 that DEF from PhysOpt cannot be read into
Silicon Ensemble because of the PROPERTYDEFINITIONS.  I'd like to add one
more reason -- PhysOpt writes SHAPE properties in the wrong place.  When we
dump DEF out of PhysOpt we get statements like this, which are not valid in
Cadence DEF (and PKS falls on this when we load the DEF, unless we edit it
manually.)

   - vdd ( * vdd )
     + ROUTED  + SHAPE STRIPE  m3 9400 ( 3504500 900 ) ( * 2385000 )
     NEW  + SHAPE STRIPE  m3 9400 ( 3693050 900 ) ( * 2385000 )
     NEW  + SHAPE STRIPE  m3 9400 ( 3854500 900 ) ( * 2385000 )
   ...

   NEW  + SHAPE STRIPE  m4 9200 ( 900 1769000 ) ( 3859200 * )
     NEW  + SHAPE RING  m4 7000 ( 0 2388500 ) ( 4375000 * )
     NEW  + SHAPE STRIPE  m4 9200 ( 900 1944200 ) ( 3859200 * )
     NEW  + SHAPE RING  m4 7000 ( 0 3500 ) ( 4375000 * )
     NEW  + SHAPE STRIPE  m3 9400 ( 3154500 900 ) ( * 2385000 )

This is the DEF format, taken from Cadence openbook:

   {ROUTED | FIXED | COVER | SHIELD shieldNetName}
   layerName width
   [+ SHAPE {RING | PADRING | BLOCKRING | STRIPE | FOLLOWPIN | IOWIRE
      | BLOCKWIRE | BLOCKAGEWIRE | FILLWIRE}]
   ( x y) [ ( x * ) | ( * y ) | viaName]...
   [ NEW layerName width
   [+ SHAPE {RING | PADRING | BLOCKRING | STRIPE | FOLLOWPIN | IOWIRE
      | BLOCKWIRE | BLOCKAGEWIRE | FILLWIRE}]
   ( x y ) [ ( x * ) | ( * y ) | viaName ]...]...


The metal width should be before the "+ SHAPE " statement.

This happends either when I use the new PhysOpt 2002.05 "write_def" and also
when I save db and use db2def.  This is a bug, and Synopsys reported they
will fix it in 2002.05-SP1 release of PhysOpt.

    - Noa Safra
      Motorola


( ESNUG 398 Item 9 ) --------------------------------------------- [07/31/02]

Subject: ( ESNUG 396 #4 ) Synopsys Slow Getting Floorplan Compiler To Users

> Floorplan Compiler (v2002.05) is available at the SNPS ftp site.  Does
> anyone know if Chip Architect licenses will be converted to FPC licences?
> I'm meeting with my SNPS folks today to ask that question, but would like
> to double check this.  With tight budgets and the risk of more layoffs,
> now is not a good time for me to suggest the firm even evaluate another
> tool.  But since we already have two Chip Arch licenses...
>
>     - Mark Wroblewski
>       Cirrus Logic                               Broomfield, CO


From: Mark Wroblewski <markwrob@colorado.cirrus.com>

Hi John,

FWIW, Synopsys has been dragging their heels with me about getting an FPC
eval license.  It has been more than two weeks since I requested it.  I
believe this indicates that in spite of the announcement of the tool's
availability, it is really available only to a select few customers while
the big kinks are still getting worked out.  I'd be surprised if more
than 2 or 3 designs tape out using FPC in the next 6 months.

FWIW #2, I found out from my Synopsys sales guy that Chip Architect comes
in two flavors, an entry-level version and a more powerful suite.  I think
the latter is labelled CA-Expert or maybe CA-Ultra.  Anyway, the sales guy
stated there is reasonable room to talk about converting the more powerful
Chip Architect suite to an FPC license, but you can't expect a direct
conversion of the entry-level Chip Architect tool to an FPC license,
generally speaking.  That shouldn't keep people from asking about any such
deal when the right time comes, as nearly every customer's account status
with Synopsys can differ.

For the time being, we've still got two Chip Architect licenses my firm
picked up this year, that no one uses as far as I know, and we have no
ability to even try FPC yet.  It would have been nice to get the eval out of
the way this month, but now it's probably getting too late to eval any tool
before getting dirty on our next tapeout, so we'll probably stick with the
Cadence SE/CTGen/Design Planner approach from last chip, as painful as it
may be.

    - Mark Wroblewski
      Cirrus Logic                               Broomfield, CO


( ESNUG 398 Item 10 ) -------------------------------------------- [07/31/02]

Subject: ( ESNUG 396 #5 ) PhysOpt 2.5-D Extraction Isn't Exactly Here Yet

> What's the deal with the new 2.5 D extraction in PhysOpt?  Specifically,
> will it eliminate the RC correlation step in PhysOpt? 
>
> I was told by the PhysOpt R&D manager at DAC that this new extraction
> engine should eliminate the need for correlation.  That we should try it
> for ourselves and if the R and C factors are not close to 1, we should
> file a bug report.  However, our scaling factors turn out exactly the same
> as their 1-D extraction and nowhere close to 1.
>
> I can't get a straight answer from Synsopys.  Any ideas?
>
>     - Mahsa Vahidi
>       Mindspeed Technologies                     San Diego CA


From: [ Trojan Man ]

John,

The new 2.5 D PhysOpt extractor is untested alpha code based on the Synopsys
proprietary .db and .pdb databases.  It is green code.  This is why the
PhysOpt R&D manager wants users to file bugs against it.  Avanti Star-RC
gives Synopsys a debugged 2.5 D extractor, but it is based on the Avanti
proprietary Milkyway database.  PhysOpt won't have viable 2.5 D extraction
until PhysOpt is ported into Milkway.  If you publish this, I am anon.

    - [ Trojan Man ]

         ----    ----    ----    ----    ----    ----   ----

From: Mahsa Vahidi <mahsa@mindspeed.com>

Hi John,

I finally got the answer to my question from a Synopsys AE.  The PhysOpt
2.5-D extraction engine does not need RC correlation.  Actually, there is
no way to even obtain correlation numbers in the current version of PhysOpt.

The numbers which were printed out by estimate_rc (and were confusing me)
are there for backward compatibility and are comparing the actual DSPF to
the numbers in the library.  Basically, these numbers should be ignored.
The curve that is printed out by the compare_rc command (which actually
should be called compare_c since it doesn't compare resistance) is the
actual correlation between the 2.5-D extraction engine and your DSPF. 

If the curve doesn't correlate well, you have to back-annotate post-route
data and run physopt -incr -post_route to fix any violations if they exist.

    - Mahsa Vahidi
      Mindpseed Technologies                     San Diego CA

 
( ESNUG 398 Item 11 ) -------------------------------------------- [07/31/02]

From: Stuart Sutherland <stuart@sutherland-hdl.com>
Subject: These Fruity Flavors Of C Doesn't Mean Verilog's PLI Is Going Away

Hi John,

In past ESNUG newsletters, and in other forums, there has been a lot of 
discussion about how Synopsys DirectC and/or Co-Design's Cblend and/or 
SystemVerilog will replace the Verilog PLI.  As a true-blue PLI bigot (I 
wrote that big 850 page book on using the PLI) and one who makes a living 
doing Verilog PLI training and consulting, I keep getting questions about 
how much longer will the Verilog PLI be around, and will my business die 
when there is no PLI.  So, if you will permit me to use your ESNUG forum, I
would like to set the record straight on the difference between Synopsys's 
DirectC and the Verilog PLI.

For the majority of us that use Verilog, the Verilog PLI currently serves 
two major roles.  In its first role, the PLI allows third-party software 
tool vendors or in-house CAD departments to write applications that are 
portable -- perhaps with a tweak or two--to all major Verilog simulators.
The Verilog PLI serves as a uniform layer between the application and the
simulator's internal data structures.  In addition, the PLI's interface 
layer allows those software tools to analyze a simulator's data structure 
and search for specific information, without the application needing to 
know anything at all about how the simulator has organized its data 
structures.  Without a standard procedural interface, every application 
would have to be rewritten for each and every simulator.

In its second role, the PLI allows those of us who do design and/or 
verification to access the C language from within Verilog source code.
This is a great feature, and in my bigoted opinion, it is one of the
reasons for Verilog having been so successful.  However, even as a PLI 
bigot, I must also candidly admit that I think the PLI is very ill-suited 
for this second usage.  The PLI is cumbersome and difficult to learn.
Design and verification engineers have to jump through too many hoops
just to access C code from their Verilog code.  More importantly, 
because the PLI's is an interface layer, it is inherently inefficient for 
run-time performance.  Using the PLI to mingle C code with Verilog code is 
an abuse of what the PLI was designed for.  In essence, when designers use 
the Verilog PLI just to call a C function, they are using a 20 pound sledge
hammer to do the job of a 4 ounce tack hammer.

DirectC and Cblend satisfy the need of the second type of PLI user.  They 
allow a mingling of C code with Verilog code, without the need of the 
complex Verilog PLI.  However, DirectC and Cblend are proprietary to their 
respective companies, which limits the usage to only those companys' 
software products.  Accellera's SystemVerilog 3.0 standard, ratified in 
June 2002, adds some C capabilities to Verilog as a true, nonproprietary 
standard.  Both Synopsys and Co-design have donated--or are planning to 
donate--the work they have done with DirectC and Cblend to Accellera, so 
that they can be standardized as part of SystemVerilog.  All three 
approaches solve the PLI abuse problem.  Design and verification engineers 
can hook their Verilog and C code together directly in the Verilog 
language, without having to learn and use an industrial strength procedural 
interface.

However, DirectC, Cblend and SystemVerilog do not replace what the PLI can 
do very well, which is provide a universal interface to any compliant 
simulator.  Software tools that need to access the internal data structures 
of a simulator, and have the program work with any simulator, need a true 
interface layer.  There is, and I think there always will be, a need for 
the Verilog PLI.

I applaud the work that Synopsys and Co-design have done to give design and 
verification engineers the capabilities they need without having to use the 
PLI.  I also applaud them for opening up that work to Accellera so that it 
can be standardized as part of SystemVerilog.   And I'm not at all worried 
about this integrated C capability taking away my business of Verilog PLI 
training -- I have a SystemVerilog training course, too.  ;)

    - Stu Sutherland
      Sutherland HDL, Inc.                       Tualatin, OR


( ESNUG 398 Item 12 ) -------------------------------------------- [07/31/02]

From: Christen Jocson <cmcc@synopsys.com>
Subject: User Confusion On Units For Library Derived PhysOpt R and C Values

Hi John,

I was engaged with a customer that was having problems with extremely high
Design Rule Cost at the start of a PhysOpt run.  One of the investigations
we pursued was the possibility that his R and C numbers were incorrect.  At
the beginning of a PhysOpt run, GR-10 information messages get reported,
stating the library derived horizontal & vertical capacitance & resistance.
These numbers don't have units on them, which can lead to much confusion
when trying to determine if the values are correct or not.

The units for the Library Derived Horizontal and Vertical Capacitance are
taken from the target/logical library.  The units for the Library Derived
Horizontal and Vertical Resistance are derived from the time unit and the
capacitive load unit in the target/logical library.  Mistakenly, several
users have thought that the pulling resistance unit in the target/logical
library is the unit for the Library Derived Resistance values.  If the
pulling resistance unit is used on the Library Derived Resistance values
it typically makes them appear to be off by 1x10^3, which can be quite
alarming.

Here's a quick example of how to apply the units from the target/logical
library.  Target/Logical Libary contains:

             time_unit : "1ns" ;
             pulling_resistance_unit : "1kohm" ;
             capacitive_load_unit ( 1, pf ) ;

The Library Derived Capacitance unit is picofarads (pF).  The Library
Derived Resistance unit is [1x10^-9/1x10^-15 = 1x10^6] megaohms (MOhm).

For the "average" design, you would expect Resistance values around
Nx10^-6 to Nx10^-7.  Then when applying the MOhm unit, the Resistance
would be on the order of N to 0.N Ohms.  These estimates are strictly
design dependent and may vary greatly.

    - Christin Jocson
      Synopsys, Inc.                             Dallas, TX


============================================================================
 Trying to figure out a Synopsys bug?  Want to hear how 14,063 other users
  dealt with it?  Then join the E-Mail Synopsys Users Group (ESNUG)!
 
     !!!     "It's not a BUG,               jcooley@TheWorld.com
    /o o\  /  it's a FEATURE!"                 (508) 429-4357
   (  >  )
    \ - /     - John Cooley, EDA & ASIC Design Consultant in Synopsys,
    _] [_         Verilog, VHDL and numerous Design Methodologies.

    Holliston Poor Farm, P.O. Box 6222, Holliston, MA  01746-6222
  Legal Disclaimer: "As always, anything said here is only opinion."
 The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)