"EDA is the only industry I know of that refuses to serve its customers.
   For every dollar spent on EDA, something like two dollars is spent by
   the user after he spends that dollar.  That tells me the EDA industry
   is not providing the service customers need and is leaving a huge
   amount of money on the table."

       - Steve Domenik, general partner with venture capital firm
         Sevin Rosen Funds (EE Times 9/6/99)

( ESNUG 331 Subjects ) ------------------------------------------- [10/7/99]

 Item  1: One User's Review Of The Synopsys FlexRoute ("Everest TLR") Tool
 Item  2: ( ESNUG 330 #1 )  Synopsys Sales Just *Now* Likes Small Customers
 Item  3: "Stats" -- A User Customizable PrimeTime Timing Report Tcl Script
 Item  4: ( ESNUG 317 #5 330 #7 )  Troubling Scan Insertion Name Changes
 Item  5: ( ESNUG 329 #9 )  IP Encryption Doesn't Need To Be Bullet Proof
 Item  6: ( ESNUG 321 #4 329 #15 ) Chrysalis, Formality, Verplex, Quickturn
 Item  7: ( ESNUG 330 #8 )  Run Cadence Pearl In Tcl Mode To Use DC Input
 Item  8: Help!  I'm Being Sucked Into The DC Billion Switches Black Hole!

 The complete, searchable ESNUG Archive Site is at <http://www.DeepChip.com>


( ESNUG 331 Item 1 ) --------------------------------------------- [10/7/99]

From: "Sam Appleton" <sama@groovy.mti.sgi.com>
Subject: One User's Review Of The Synopsys FlexRoute ("Everest TLR") Tool

Dear John,

I know you like user reviews of tools.  But, before I go into experiences
using Everest TLR / Synopsys FlexRoute, I've got to explain the physical
landscape of the chip it was used on.  Our chip, Krypton, is 3 million
gates, 133 Mhz, and it's an in-socket replacement for an older chip.  Our
design goal was to add more functionality, 33 percent higher speed and,
therefore, significantly improve our transistor density by at least 20
percent -- hence our focus on the newer, supposedly better P&R tools.

Krypton had 23 major sub-blocks (with 7,000 nets between them), two top
level 256-bit global buses and four global 128-bit buses.

The total "RAM area" was perhaps 10-20% of the chip (although I can't
measure precisely by just eyeballing a plot) with 47 RAM/RF instances.
The bulk of the RF's were Artisan 32x32's and 32x64's.

Krypton had 751,830 placeable instances to place-and-route.


The Five Big Flat Layout "Hells"
--------------------------------

I know it's very common for some companies to do layout as a process on one
big flat design.  We considered flat, but these five "hells" came up:

  o big flat designs are run-time hell

    We had five copies of Cadence Silicon Ensemble (rev 5.1).  Placing and
    routing a 3 Mgate ASIC would have a huge runtime (guessing 3-4 days or
    more), making iterations very difficult and time consuming.  Block
    logic grouping and global wiring patterns becomes intractable.  Timing
    convergence and signal integrity goals becomes a complete mess.

  o big flat designs are extraction hell

    Once a final layout GDS/DEF is obtained, the design must be extracted
    and passed through static timing to verify timing behavior.  Based on
    our experience with Avanti Star-RC rev 98.5, extraction of our 3 Mgate
    ASIC flat was very difficult and it typically took 1-2 weeks just to
    obtain good a SPF.  In addition, the 2 giga byte size limit imposed by
    Solaris 2.5 prohibits the filesizes required for a 3 Mgate SPF, unless
    the design is extracted hierachially.

  o big flat designs are back-annotation hell

    Once our full flat chip SPF has been generated, it must be accurately
    converted to SDF.  Even if a full flat SPF can be generated, generating
    an SDF from this file was impossible with Avanti Star-DC rev 98.5, so
    we used Ultima Millennium rev 1.8 from <hhtp://www.ultimatech.com> .
    We liked Ultima.  We could extract all the blocks in our chip on an
    individual basis and then Ultima would merge all these SPFs into one
    big SDF.  We could then load that into Primetime and have a ball doing
    timing analysis.  Good stuff.

  o big flat designs are clock tree hell

    Inserting a clock into a flat design that is 11 x 11 mm is a bad joke
    from almost every perspective.  In a big flat design, our signal wires
    in initial levels of the tree were very long, making predictability of
    delay in the first stages of the clock very hard.  Too many crosstalk
    problems, too.  The tool we had to use, Cadence's CTgen rev. 3.3,
    ability to analyze and insert a clock with a network of 500K flip-flops
    over such an area sucked.  Overall, CTgen sucked.  To insert clocks, I
    had to write my own tool with a few thousand lines of Perl to do the
    final clock insertion.

  o big flat designs are timing closure hell

    John, as your readers know, Design Compiler *estimates* wireloads.
    Once our flat design was routed, fixing the small errors due to the
    differences between *estimated* wireloads and the real design meant
    either IPO netlist iterations or our whole design had to be redone
    starting from the floorplan.  Either way, our full design must still be
    re-extracted & re-annotated -- making a very long timing closure loop.

In practical terms, with engineers here running around tweaking and pumping
netlists out of Design Compiler every day, some way to compartmentalize
their work is a MUST.

So, obviously John, we chose the hierarchical approach.


Choosing Routing Tools
----------------------

As we began developing our CAD flow, we looked at the grid-based routers.
We eliminated Avanti for this review because we've had such spotty support
from them in the past and recently.  This left only Cadence WarpRoute and
Compass PathFinder to review.  We found:

  o no facilities for manual editing and route control
  o they're are not tuned to block-based designs
  o major buses cannot be controlled or planned
  o special nets like clocks and noise-sensitive signals can not be
    adequately controlled 
  o variable width and spacing on nets is very limited

For top-level block-based routers, we evaluated Everest TLR (which is now
Synopsys FlexRoute) and Cadence IC Craftsman.  After using IC Craftsman
for a couple of weeks, we very quickly gave up once we saw Everest.  IC
Craftsman was a pain.  Hard to get data in and out, hard to control busses
(or anything for that matter), and it was slow.  Everest had an easier GUI,
it worked faster, and it did all that stuff I listed above plus:

  o block pin optimization and control
  o controllable bus routing
  o top-level floorplanning 
  o shielding and length balancing
  o easy net/bus prerouting

So we chose to use Everest TLR / Synopsys FlexRoute.


Our Experiences With FlexRoute
------------------------------

First off, the ASCII text format of FlexRoute's database made it very easy
to interface to other place-and-route tools, as well as do design revision
control with.  No cryptic binaries or proprietary formats to "manage".

Second, FlexRoute's Perl interface also allowed us to easily automate many
repetitive functions as well as add custom functionality that was unique
to our environment.  FlexRoute's interface itself provides full access to
the tools' database for manipulation -- a very handy feature for those
wishing to customize the tool for their particular flow.

On to the nitty-gritty...

Since FlexRoute is for hierachial chip assembly, it fit into our overall
hierarchy-preferred approach quite well.  We split our design along sections
of the logical hierachy, using 23 top-level blocks, each of which was a
place-and-route "unit".  Some of these blocks also had sub-units (either
custom logic or other place-and-route units), giving a 3-level hierachy in
parts of the design.  This enabled an incredible acceleration of our design
cycle:

  o smaller place-and-route units

    Although there were many blocks, the layout generation time for each
    was very small, ranging from 1 to 8 hours.  Using our five Silicon
    Ensemble licenses enabled many blocks to be turned around in parallel
    (impossible with a flat approach).  Design management for these blocks
    was much easier.  Clock Insertion (using our homebrew tool) on these
    smaller blocks gave excellent skew performance and a relatively shallow
    tree.  It also allowed better control of crosstalk issues on these
    critical signals.

  o hierachial extraction got easy

    We set up our extraction tool, Avanti Star-RC, to operate on the top
    level layout only to the interface "ports" of each block.  This
    massively improved top-level extraction time as well as making basic
    chip-level LVS faster.  Extraction time went from 10 days (because
    Star-RC kept crashing) to about 4 hours using it hierarchically.

    Our delay calculator, Ultima Millennium, would merge hierachial
    extraction results and give a full-chip SDF for back-annotation.
    Changing one part of the design required only that part of the design
    be re-extracted, rather than the full-chip (a much faster and much
    less difficult task).

  o hierachial verification got easy, too

    Each block was also run through Cadence Dracula3 DRC verification 
    (rev 4.6) in parallel with layout extraction with Avanti Star-RC.  (The
    project got very CPU hungry then!)  We able to verify each block as
    clean, with the top-level layout clean, requiring verification of only
    DRC issues pertaining to interactions between top-level and block-level
    layouts.  This allowed us to tapeout after only one DRC check on the
    completed GDS -- it had no errors after layout generation, requiring no
    time-intensive iterations for DRC fixes.  To verify the final layout,
    we used no hierachy and ran full-chip LVS (Avanti Hercules) to ensure
    no interface issues remained from assembly.

  o transistor density increased 25%

    Our transistor density increased 25% with FlexRoute because of lack of
    significant channels and the clean, aligned routing that FlexRoute
    performed.  Nice.  We iterated hundreds of times on the top-level
    floorplan with pin changes and block size modifications, for
    optimization of both  top-level and block-level routing, as well as
    overall timing results.

    FlexRoute pin optimization was used numerous times to optimize the
    top-level routing, resulting in a route with almost no channels and no
    "switchboxes" (created when pins are not aligned and routes need to
    be hooked up).  Pin optimization combined with ultra-fast global
    routing (approximately 30 seconds on our 23 block/7000+ net design)
    allowed us to very quickly evaluate the chip floorplan for routability
    and channel density issues before re-routing affected blocks.

    I guesstimate a 10-20% greater density could have been achieved had
    we pushed the floorplan harder.


Make, Timelines and Tapeout
---------------------------

Using this hierachial design approach let us iterate the entire chips'
layout in just over a day.  Our steps were:

  1.) designer creates Verilog
  2.) synthesize with Design Compiler
  3.) P&R with Silicon Ensemble
  4.) layout extraction with Avanti Star-RC
  5.) delay calculation using Ultima
  6.) load it into PrimeTime
  7.) analyze critical paths, etc. back to step 1 or 2 (depending)

Steps 3 through 6 were done automatically with a make file.  Didn't have
to touch nor hand hold anything.  This took typically 2 and 16 hours to
do depending on the block size with the largest being 120 Kgates.

Going steps 1 through 7, we could turn around the entire chip from new block
netlists to a top-level timing report in around 1.5 days.

For FlexRoute itself, our 3 Mgate ASIC with 23 top-level blocks and over
7,000 nets had routing runtimes of 30 minutes (for preliminary top-level
routes during pre-tapeout timing checks) and 60 minutes (for full top-level
routing with all preroutes and design data for tapeout-quality layout,
including tuned clocks.)  By comparision, our old Compass tools used to
take 24 hours just to route a smaller sized design!

This massive improvement allowed us to iterate many more times for optimal
timing and area utilization than we were able to do before.

We iterated on the top-level clock network approximately 50 times for
the optimal skew characteristics.  These changes were made in the FlexRoute
database (instead of in a layout database), -- so that new changes to the
netlist for timing and functionality did not disrupt the ongoing effort in
clock tuning the top-level network.

    - Sam Appleton
      SGI                                    Moutain View, CA


( ESNUG 331 Item 2 ) --------------------------------------------- [10/7/99]

Subject: ( ESNUG 330 #1 )  Synopsys Sales Just *Now* Likes Small Customers

> Working for a small company, it was frustratingly difficult to get any
> quote at all from Synopsys, whereas Ambit were more than happy to take
> my business.
>
>     - Dr. Paul Marriott
>       Marriott Design Services                  Montreal, Canada


From: Jeff Lancaster <jlancast@phobos.com>

Hi John,

First let me express my pleasant surprise that ESNUG dares to publish frank
comments about the recent Synopsys price hikes and its competition with
the Ambit BuildGates tool.

I must admit that I, too, as Dr. Marriott's story goes, had a rough time
with my local Synopsys sales office and Synopsys' licensing practices a few
years ago.  (I'm talking about the Denver Synopsys office.)  I won't go into
specifics because it all seemed to have changed in the last 12 to 18 months.
They have been very cooperative and helpful to us.  But this love/hate thing
continues to go on as we scrambled last month to take advantage of the old
pricing structure before it expired September 1.

It seems to me that Synopsys thought it was safe to hike their prices
knowing that BuildGates was safely in the clutches of Cadence.  And maybe
Formality is safe too with Chrysalis tucked in the Avanti crowd.

I like Design Compiler because I trust it to make good silicon and I get
good support.  I get the same results from Cadence and Avanti with the tools
that they sell.  I applaud Cadence for daring to compete with Synopsys, even
if their 80% price reduction was a shameless attempt to capture a piece of
the market.  Some day the only difference among them will be price, I hope.

    - Jeff Lancaster
      Phobos Corp.                            Salt Lake City, Utah


( ESNUG 331 Item 3 ) --------------------------------------------- [10/7/99]

From: [ My Own Private Idaho ]
Subject: "Stats" -- A User Customizable PrimeTime Timing Report Tcl Script

Hi John,

I know you weren't too enthused when Synopsys announced their supporting
the Tcl interface, but have you ever wanted to create your own PrimeTime 
report that has data not found in a standard report?  Perhaps consolidate
information from different PrimeTime reports into one report?  Include
Cindy Crawford's fan club URL in the report header?  With Tcl in PrimeTime,
you can do this.  It's all done with PT attributes. 

In PrimeTime you have direct access to any timing path(s) and attributes
for the pins, ports, clocks, nets, etc.  These attributes include arrival
times, capacitance, transition, slack, uncertainty, etc.  You can have
PrimeTime list them all by:

   pt_shell> list_attributes -application

Once you have the values of these attributes, you can do anything you wish 
with them.  You can format them, print them, and even modify some of them. 
You can make your report a new PT command.  You can share it with your 
co-workers.  Life is great and the world is your oyster.

The basic mechanics of creating your own PT command:

  1. Create a Tcl procedure in your fave editor:

         proc my_report { <options> } {
             ...PT Tcl code here...
         }

  2. Save the file, call it say 'my_report.tcl'

  3. Fire up PT.  Declare your new command by sourcing the script:

         pt_shell> source my_report.tcl

     (Note that this doesnt *run* the procedure, it just declares it.)

  4. Run your new command:

         pt_shell> my_report <options>

That's all there is to it.  Yes, you can put multiple procedures in a file.
Yes, you can declare multiple script files.  No, you can't re-define
existing PrimeTime/Tcl commands (although the practical joke implications
of this are mind-boggling.)  Don't want to source the proc every time you
run PT? Put it in your .synopsys_pt.setup and be done with it!

Below is a script "stats" that will report a nice summary of your design,
showing things like the total number of ports, pins, nets, cells, latches,
flip-flops, etc.  I encourage you to modify it any which way you find
useful!  Here's an example of the output of "stats":

  pt_shell> stats
  ****************************************
  Report : stats
  Design : LATCH
  Version: 1999.05-PT2.1
  Date   : Thu Sep 30 12:17:30 PDT 1999
  ****************************************

  Design            : BIG_TOP
  Ports             : 176 (in:13 out:7 inout:156)
  Cells             : 219343
    Hierarchical    : 9102
    Leaf            : 210241
      Sequential    : 33587 (latch:5640 ff:27935 ms:0)
      Combinational : 176032
  Pins              : 925036
    Leaf            : 759638 (in:517896 out:240168 inout:1574)
    Hier            : 165398 (in:99913 out:54244 inout:11241)
    Three_state     : 10385
  Clocks            : 12
  Generated clocks  : 5
  Timing exceptions : 2312
      -through      : 55

Pretty neat, huh?  Below is the PrimeTime Tcl source code for the "stats"
script.  It's fairly straightforward, getting attributes about the design
and formatting them into a simple report.  No conditional statements,
nothing fancy.  The only tricky part is counting the exceptions in the
design, as this isn't available directly as an attribute in PrimeTime.
It actually issues a report_exceptions command and parses the output,
right in-line. 


  # PROCEDURE: report_print_header
  #
  # Abstract: Prints a report header for the current design.

  proc report_print_header {title} {
    global sh_product_version;
    echo "****************************************"
    echo [format "Report : %s" $title]
    echo [format "Design : %s" [get_attribute [current_design] full_name]]
    echo [format "Version: %s" $sh_product_version]
    echo [format "Date   : %s" [report_get_date]]
    echo "****************************************\n"
  }


  #  PROCEDURE: stats
  #
  #  ABSTRACT: returns quick stats on the current design
  #
  #  SYNTAX:	stats

  proc stats {} {

    report_print_header "stats"

    # port stats
    set ports        [get_ports *]
    set in_ports     [filter_collection $ports "direction == in"   ]
    set out_ports    [filter_collection $ports "direction == out"  ]
    set inout_ports  [filter_collection $ports "direction == inout"]

    set port_num       [sizeof_collection $ports]
    set in_port_num    [sizeof_collection $in_ports]
    set out_port_num   [sizeof_collection $out_ports]
    set inout_port_num [sizeof_collection $inout_ports]


    # cell stats
    set cells [get_cells -hier -quiet *]
    set hier_cells [filter_collection $cells "is_hierarchical == true"]
    set leaf_cells [filter_collection $cells "is_hierarchical == false"]

    set cell_num      [sizeof_collection $cells]
    set hier_cell_num [sizeof_collection $hier_cells]
    set leaf_cell_num [sizeof_collection $leaf_cells]

    # combo and sequential cells
    set seq_cells [filter_collection $leaf_cells "is_sequential == true"]
    set seq_cell_num [sizeof_collection $seq_cells]
    set comb_cells [filter_collection $leaf_cells "is_combinational == true"]
    set comb_cell_num [sizeof_collection $comb_cells]

    # categorize seq cells
    set latch_regs    [all_registers -level_sensitive]
    set latch_reg_num [sizeof_collection $latch_regs]
    set ff_regs       [all_registers -edge_triggered]
    set ff_reg_num    [sizeof_collection $ff_regs]
    set ms_regs       [all_registers -master_slave]
    set ms_reg_num    [sizeof_collection $ms_regs]

    # pin stats
    set pins [get_pins -quiet -hier *]
    set pin_num [sizeof_collection $pins]

    # leaf pin stats
    set leaf_pins          [get_pins -quiet -of_object $leaf_cells]
    set leaf_in_pins       [filter_collection $leaf_pins "direction == in"   ]
    set leaf_out_pins      [filter_collection $leaf_pins "direction == out"  ]
    set leaf_inout_pins    [filter_collection $leaf_pins "direction == inout"]
    set leaf_tristate_pins [filter_collection $leaf_pins "is_three_state == true"]

    set leaf_pin_num          [sizeof_collection $leaf_pins]
    set leaf_in_pin_num       [sizeof_collection $leaf_in_pins]
    set leaf_out_pin_num      [sizeof_collection $leaf_out_pins]
    set leaf_inout_pin_num    [sizeof_collection $leaf_inout_pins]
    set leaf_tristate_pin_num [sizeof_collection $leaf_tristate_pins]

    # hierarchical pin stats
    set hier_pins       [get_pins -quiet -of_object $hier_cells]
    set hier_in_pins    [filter_collection $hier_pins "direction == in"   ]
    set hier_out_pins   [filter_collection $hier_pins "direction == out"  ]
    set hier_inout_pins [filter_collection $hier_pins "direction == inout"]

    set hier_pin_num       [sizeof_collection $hier_pins]
    set hier_in_pin_num    [sizeof_collection $hier_in_pins]
    set hier_out_pin_num   [sizeof_collection $hier_out_pins]
    set hier_inout_pin_num [sizeof_collection $hier_inout_pins]

    # clocks
    set clocks         [get_clocks -quiet *]
    set gen_clocks     [get_generated_clocks -quiet *]
    set clock_num      [sizeof_collection $clocks]
    set gen_clock_num  [sizeof_collection $gen_clocks]

    # timing exceptions - the tricky part
    redirect ./rpt_except.log {report_exception -nosplit}
    set F [open ./rpt_except.log r]
    set found_start 0
    set thru_excpt_num 0
    set excpt_num 0

    foreach line [split [read $F] \n] {
      if {[regexp {^-+$} $line] } {
        set found_start 1
        continue
      }
      if { $found_start == 0 } continue
      if {[regexp {^[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+ +[^ ]+$} $line] == 1} {
        set thru_excpt_num [expr $thru_excpt_num + 1]
        set excpt_num [expr $excpt_num + 1]
      } elseif {[regexp {^.+\ +.+\ +.+\ +.+$} $line] == 1} {
        set excpt_num [expr $excpt_num + 1]
      }
    }
    close $F

    echo [format "Design            : %s" \
         [get_attribute [current_design] full_name]]
    echo [format "Ports             : %d (in:%d out:%d inout:%d)" \
         $port_num $in_port_num $out_port_num $inout_port_num]
    echo [format "Cells             : %d" $cell_num]
    echo [format "  Hierarchical    : %d" $hier_cell_num]
    echo [format "  Leaf            : %d" $leaf_cell_num]
    echo [format "    Sequential    : %d (latch:%d ff:%d ms:%d)" \
         $seq_cell_num $latch_reg_num $ff_reg_num $ms_reg_num]
    echo [format "    Combinational : %d" $comb_cell_num]
    echo [format "Pins              : %d" $pin_num]
    echo [format "  Leaf            : %d (in:%d out:%d inout:%d)" \
         $leaf_pin_num $leaf_in_pin_num $leaf_out_pin_num \
         $leaf_inout_pin_num]
    echo [format "  Hier            : %d (in:%d out:%d inout:%d)" \
         $hier_pin_num $hier_in_pin_num $hier_out_pin_num \
         $hier_inout_pin_num]
    echo [format "  Three_state     : %d" $leaf_tristate_pin_num]
    echo [format "Clocks            : %d" $clock_num]
    echo [format "Generated clocks  : %d" $gen_clock_num]
    echo [format "Timing exceptions : %d" $excpt_num]
    echo [format "    -through      : %d" $thru_excpt_num]
 
    return 1
  }
  define_proc_attributes stats \
          -info "Report design statistics (cells, pins,..)"


Feel free to modify this, reformat it, etc. as needed.  However, be careful
if you are depending on your script in a sign-off situation, be sure its
correct, accurate, yada, yada, yada.  It would be kind of a shame if, say,
an entire city lost power because your chip didn't meet timing, simply
because you forgot to account for clock latency in a custom timing report.
Oops.

No names, etc.  Politics.

    - [ My Own Private Idaho ]


( ESNUG 331 Item 4 ) --------------------------------------------- [10/7/99]

Subject: ( ESNUG 317 #5 330 #7 )  Troubling Scan Insertion Name Changes

> The script referred to by project_name + "_wireload.scr" is a brute force
> search and destroy script to match up the current_design design name with
> an associated wireload.  This worked great until I tried to route the scan
> chains at the core level, which changes the design names!  The variable
> insert_test_design_naming_style can be used to control how the design is
> renamed, but the %s and %d fields are required.  I set it to %s_scan_%d.
>
>     - Bob Wiegand
>       Creative Labs                                      Malvern, PA


From: Gzim Derti <gderti@intrinsix.com>
To: Bob Wiegand <rwiegand@ensoniq.com>

Robert,

You mentioned in ESNUG 330 about DC changing the names of blocks after scan
insertion.  I have noticed the same thing recently.  It seems to me that
the blocks that have their name changed are those blocks which contain scan
VIOLATIONS.  Did you happen to notice if this was the same for you???

Basically, the way I noticed this was that I'm doing a bottom up scan
insertion, all of my leaf cells seemed to scan correctly, BUT when I go up
one level to try and get DC to connect up all the individual chains is when
replicas of the lower blocks are created.  The blocks that are "copied" seem
to be those blocks that have some form of scan violation at this newer,
higher level of hierarchy.

In my case, it had to do with a design that I was handed, the lower level
blocks have clocks and resets passing in from above and everything seems
to be working well.  BUT, at the next higher level of hierarchy, DC noticed
that the "reset control" line for a particular block came from gated logic,
and one of the inputs to the logic came from a flop WITHIN the design, thus
I got a violation about possible instability during scan and capture.
Along with duplicating the blocks, DC leaves these "violated" blocks OUT of
the resultant final scan chain, AND the "violated" blocks have been
duplicated.

SO, it seems to have something to do with scan violations that are found.

ALSO, I just did a compare between the two designs... (in my case let's use
Xcounter as the name).  Xcounter scan inserts with no problem.  When I go up
one level and try to connect the scan chains up, Xcounter is replicated and
called Xcounter_test_1.  Xcounter contains the scan flops, BUT the design
Xcounter_test_1 is a PRE-SCAN version and is NOW whats being linked to in
the design.

I have NO idea why it does this other than something to do with the scan
violations concerning reset signals gated with internal flop outputs.

Hopefully, all of that made some form of sense.  I'll try and see if I can
figure any thing else out.

    - Gzim Derti
      Intrinsix Corp.                              Rochester, NY

         ----    ----    ----    ----    ----    ----   ----

From: Robert Wiegand <rwiegand@ensoniq.com>
To: Gzim Derti <gderti@intrinsix.com>

Gzim,

What I have noticed is that the design names change after running the 
insert_scan command at core level.  The lower level modules were compiled
with the -scan switch, but no insert_scan was run.  As a result, there are 
no test_se inputs on any of the designs (except for the RAM wrappers) 
untill the core level insert_scan is run.  Check out ESNUG 317 #5,  "Seven
Cool Tricks From My Adventures Using Test Compiler".  I went into a lot of 
detail there about the methodology.  Anyway, when the scan enable inputs 
are added to the lower level designs, they are renamed.  The original 
designs are still in memory with unrouted scan logic, but the core now 
links the new renamed designs with routed scan logic.  I had performed 
check_test on all the lower designs, as well as the core and worked out the 
problems.  I had the luxury of a test_mode pin which I could use to gate 
out any reset testability problems.  The only design names that weren't 
changed were purly combinatorial blocks, which didn't need a test_se port. 

In ESNUG 317, I reported a problem of having all the test_se connectivity 
get trashed and then reconnected to the existing tree when trying to do a 
netlist level insert_scan to add the clocks block scan chains to the 
existing core chains and connect all the core level test_si's and
test_so's.  It seems that doing an insert_scan at the core followed by an 
insert scan at the top (without rerouting the core chains) had something to 
do with it.  I have since tried using various permutations of the -hookup 
option of set_scan_signal -test_scan_enable -port test_se command to get 
around the problem, but without success.  But I found another way to get 
around it.  Make sure the top level test_se connectivy is already present, 
and then use set_scan_configuration -route_signals serial.  This will leave 
the scan enable connectivity alone, and only deal with the scan chain 
connectivity.

    - Bob Wiegand
      Creative Labs                                    Malvern, PA


( ESNUG 331 Item 5 ) --------------------------------------------- [10/7/99]

Subject: ( ESNUG 329 #9 )  IP Encryption Doesn't Need To Be Bullet Proof

> Encryption packages operate on the priciple that *both* the sender and
> the receiver of the encrypted data actually want to keep it secret.
> Otherwise, you don't need the encryption in the first place.
>
>     - Matt Christiano, CEO
>       GLOBEtrotter Software                          San Jose, CA


From: Gil H Herbeck <gilherbeck@home.com>

Hi John,

I have to chime in on the IP protection topic.

Any model that involves decrypting the file to disk is kind of flawed.  In
most real design environments there will only be a few people who view that
the file was someone's IP.  Maybe a design manager, a CAD manager, or a
system administrator that were actually involved with the keys and the
decryption process.  The designers themselves will be too busy with their
designs to be worrying about what files they shouldn't read.  And once they
study the contents and "learn" what's there, the damage is done.  I'm not
talking about intentional theft.  You just shouldn't leave files on disk if
you don't want people to read them.

As an IP provider I would be interested in licensing also.  Module Compiler
combines licensing and encryption.  This model makes sense to me.  The file
is only decrypted in memory, and only if a valid license is available.  MC
uses a proprietary scheme, but an open encryption scheme could be used.

Here are the basic ideas for a model that I think could work:

  1. The "applications" decrypt - simulators, synthesizers, etc.
  2. Decryption is done into memory only.
  3. IP providers issue licenses to their customers (like EDA providers).
  4. The license also has the decryption key built in.
  5. The end user has to supply the license name for an encrypted file
     to the application.
  6. The application then checks out the license.
  7. The license server provides the decryption key to the application.

You could jazz it up with multiple keys, and different encrypted files per
customer, etc....  But I think I've described the basic idea.

This model is as secure as the weakest link.  The encryption mechanism, or
the licensing mechanism.  The idea is to keep honest people honest, and try
to make it hard for the casual criminal.  I would be more concerned about
an honest engineer "finding" my design work on his/her network and reading
it because it's there.

    - Gil Herbeck, an ex-MC CAE
      Radix20 Design Services                     Livermore, CA


( ESNUG 331 Item 6 ) --------------------------------------------- [10/7/99]

Subject: ( ESNUG 321 #4 329 #15 )  Chrysalis, Formality, Verplex, Quickturn

> to do the equivalency checking between RTL and gates, Chrysalis forces
> you to have to break up your design in 5K to 10K gate blocks.  Equivalency
> checkers do a sort of voodoo synthesis on RTL (to convert it to equations)
> and on gates (to convert that to equations) and then it compares both sets
> of equations.  Go beyond 10K gates, and the tool chokes.  So, doing the
> math, using Chrysalis meant we'd have to do comparisons of roughly 100
> 'blocks'.  Seemed like a lot of work for very little return.  Also, the
> indications were, that equivalency checking between RTL and GATEs runs
> very slow.
>
>     - Don Mills
>       LCMD Engineering                      South Jordan, UT


From: Howard Landman <HowardL@SiTera.com>

John,

Since Chrysalis Design Verifyer (DV) was a critical component of my
methodology for the R5900 processor (heart of the Sony PlayStation 2
"Emotion Engine" chip), and we even ended up buying a second license,
perhaps it would be of interest to explain what we did with it.

The earliest application was library verification.  We ended up with as
many as 5 different Verilog models of each library cell: behavioral (reg),
behavioral (wire+assign), behavioral (primitive), structural, and
transistor.  This may have been a few too many, but some were forced on us
because they were written by other groups and some we wrote ourselves to
get significant speedups in simulation.

Using DV, it was fairly straightforward to set up a library verification
script that ran all 5 libraries against each other pairwise.  This was
overkill (4 verifications should have been enough to check 5 libs), but it
ran so fast (under 2 minutes per check) that I just couldn't see not doing
them all.  The full set was run any time any library changed.  We found
quite a few bugs this way, both early problems, and the more dangerous
"last minute fix" bugs when someone changes something close to tapeout.

Of course, at this point we were only using about 1% of the license.  DV is
like Library Compiler sometimes -- one license could serve dozens or even
hundreds of users - -IF you're only doing easy stuff.

The second application was trying to verify the transistor level design of
our custom blocks (mainly datapath elements).  We had a method to abstract
a gate-level description back from the actual transistors, and we tried to
verify that against the RTL.  This was the hardest and least successful
application, but it wasn't entirely the tool's fault.  The transistor-level
block designers had not been given any constraints such as "keep the same
hierarchy, instance names, and signal names whenever possible".  So
naturally, being engineers, they changed all of the above.  This effectively
screwed all of the built-in shortcuts that DV takes when trying to figure
out what to match up with what, and caused very long run times.

Despite this, it was a useful exercise.  Even with those handicaps, we still
found around a dozen bugs.  More importantly, several of those bugs turned
out to be ones that we would never have caught by simulation.  Generally,
the kinds of bugs that were found most easily by formal verification were
different from the ones that were found most easily by simulation.

This stage was heavily CPU intensive though - some blocks ran for 40 hours.
It was this which led us to purchase a 2nd license, so we could do more
than one at a time.

But in the end, we had done RTL-to-gates for all the control logic (except a
small piece which had intentional logic loops) and some portions of the
custom blocks.

Third application: when we made a small change to one block which was not
*supposed* to be a functional change (for example a timing improvement) we
could verify it 100% against the old version by running only that module.
To do this via simulation, we would have had to build a full-processor (or
full-chip!) model, run a large number of vectors against that huge model,
and then have to settle for 90% confidence.  Formal is much faster and more
thorough in this situation.  Of course, not all changes fall into this
category, but we still saved probably 2 to 4 weeks times our entire compute
farm of regression simulation over the course of the project.

And finally, in the back-end, there were a number of transformations made
to the netlist, including clock tree, reset distribution, IPO, repeater
insertion, scan chain, etc.  Some of these processes had subtle problems.
Although I was too thickheaded to see the necessity at first, by the time
we taped out successfully I had implemented a complete chain of pairwise
gate-to-gate comparisons.  *Every* single step of backend processing had
the before and after netlists verified against each other.  This took about
25 minutes per comparison (for about 100,000 gates of control logic -- we
didn't verify the datapath because it didn't change).  Really cheap
insurance, if you ask me.  And it saved our posteriors more than once.  The
reason to do every step (instead of waiting until the end and verifying
once) is that, when something goes wrong, you want to find out ASAP, and
not a week later.  And you want to know immediately *what* step failed.

Partly due to this work, and partly due to other extensive verification
efforts (such as booting Linux on a Quickturn emulation of the gates and
running multiple simultaneous applications!), the R5900 was functional
on first silicon, and the second tapeout effort was able to concentrate
on timing improvements to reach full speed.  This too was successful.
Sony shipped development systems to game developers built with 2nd
tapeout chips.  Not bad for the first commercial 128-bit microprocessor!

    - Howard Landman
      SiTera, Inc.                              Longmount, CO

         ----    ----    ----    ----    ----    ----   ----

From: [ To Have And Have Not ]

John,

I saw this posting on ESNUG about problems with Chrysalis.  Have you looked
at Verplex?  <http://www.verplex.com>  Using it, we've found it runs at
least 10x faster and handles greater than 300K gate comparisons.  The key
to getting equivalency checks working are:

   1. pre-mapping all gate-vs-RTL state points (flip-flops) before
      comparison
   2. matching RTL hierarchy with gate hierarchy
   3. correctly modelling your flip-flops (gate often sees them as two
      latches)

We've eliminated gate-simulation with Verplex.  Good luck with your million
gate designs and please keep me anonymous.

    - [ To Have And Have Not ]

         ----    ----    ----    ----    ----    ----   ----

From: Mike Sullivan <mikes@dal.asp.ti.com>

Hi John,

I would have to disagree with Don in his comments that EC is not worth the
effort.  Granted, EC vendors today are much more mature than they were just
a few years ago.  And there is still a way to go to make all designers
happy with using EC, especially in handling some types of RTL.  But let me
illustrate what another user is currently getting out of EC vendors.
 
On the most recent project I was involved in, cycle based RTL regression
took 2 weeks on an LSF queue of 100 shared CPUs.  In context to a single
CPU, that would be 80 weeks of regression time.  If event based regression
was used, that would equate to 5X these performance numbers.  Using EC
vendors, the need for such full long regressions has been greatly reduced.
In a significantly shorter time frame, we have found numerous design
differences.  And more importantly, the reasons why were quickly located
using EC debug capabilities.  The differences found were the result of both
designer and tool hiccups.
 
Yes, today's vendors (and that's plural) are limited in their ability to
handle some types of RTL comparisons where the RTL is highly abstract or
too complex.  And they admit that.  We did not have our full RTL environment
under EC because of these limitations.  But we did successfully apply it
where we could.  For those areas where any type of RTL EC comparisons can
be applied, the gains fully outweigh any setup effort involved.  Doesn't
one also put effort into setting up for hierarchical synthesis?
 
Outside of RTL, the two real strengths of current EC vendors are in the
areas of custom/library development and Gate2Gate.  To solve the need for
speed, designs may have the requirement for custom circuit libraries.  One
needs to ensure these custom circuits match up with the simulation views.
For Gate2Gate, Don is correct in saying it could be useful for technology
migrations.  But in practice, Gate2Gate EC solves more than just that.
Especially now with a lot of the actual Gate optimization work being done by
the placement+router tools themselves.  Gate2Gate EC becomes very important.
In our experience with both custom/library and Gate2Gate EC, similar gains
were achieved as with our RTL EC successes.  Complete, fast, and static
results.
 
Not getting into a debate on which EC vendor is best, we use the two main
players, Chrysalis (Avant!) and Formality.  They both have their pros and
cons, where in our environment, each is used in the area it has strengths
in.  This may be compared to how multiple simulators could be used on a
design project to get the particular benefits of each.  Example: cycle
based simulator for RTL simulations and event based simulator for any
required Gate simulations.
 
Overall, our mentality is that designer and computing resources should be
spent getting the RTL correct to begin with, not on any derivation of it.
Dynamic tools may still be required for things such as verification of 
ATPG, multiple clock domains, and reset logic.  But where possible, I'd 
much prefer to stay in the static world.  ;)
 
    - Mike Sullivan
      Texas Instruments                             Dallas, TX


( ESNUG 331 Item 7 ) --------------------------------------------- [10/7/99]

Subject: ( ESNUG 330 #8 )  Run Cadence Pearl In Tcl Mode To Use DC Input

> We are now in the process of developing a layout flow that is using Timing
> Driven Q-place placement software from Cadence.  Being timing-driven, the
> software reads constraint files produced by, you guessed it -- Synopsys.
> The tool that reads and converts the constraint file is: Pearl (Cadence
> static timing analysis tool).  We are receiving Synopsys constraint files
> from various Synopsys versions.  Almost each time we use Pearl there are
> statements that are not recognised by the tool, and causes it to produce
> erroneous output. ...
>
>     - Eran Rotem
>       Chip Express (Israel) Ltd.                   Haifa, Israel
		

From: Mohammad Ali Mughal <mughal@postal.sps.mot.com>

John,

For Synopsys constraints to get translated into GCF by Cadence's static
timing tool, Pearl, it must be started with -tcl mode.

Start pearl as "pearl -tcl" and it will behave much better.

    - Mohammad Ali Mughal
      Motorola


( ESNUG 331 Item 8 ) --------------------------------------------- [10/7/99]

From: [ Kenny from South Park ]
Subject: Help!  I'm Being Sucked Into The DC Billion Switches Black Hole!

Dear John,

Recently, I've been having fun with Synopsys switches.  Somewhere between
the "compile_new_boolean_structure", and "hdl_use_cin" I've come up with at
least 14 known switches that modify the synthesis of your design in a major
way.  That means that there are at least 2**14 possible combinations of
those switches, plus the ones I haven't encountered yet, plus the ones they
don't tell you about.  From the synthesis errors I've been getting lately,
I wonder how many of these combinations have actually been tested Synopsys.
My guess would be not many.

After fiddling about with the problem for awhile, I've decided that the
only way to deal with this situation is to limit the number of "non-default"
switches a user can set to 1, or 2 at most.  Some switches, while they
worked together just fine in the 97.08 version, no longer "play well
together" in the 99.05 versions.  Alot of this may have to do with patches.
After 3 patches to 98.08 and two so far to 99.05, I'm having difficulty
keeping up!  This limits the designer greatly, but without formal
verification what else can you do?  It seems that you can't trust Synopsys
synthesis any more.

    - [ Kenny from South Park ]


( ESNUG 331 Networking Section ) --------------------------------- [10/7/99]

Corvallis, Oregon -- Agilent / Hewlett-Packard seeks engineers w/ Synopsys,
Verilog, and ASIC design experience.   No headhunters.  "kris@cv.hp.com"


 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)