( ESNUG 583 Item 3 ) ---------------------------------------------- [05/25/18]

Subject: Sam on Ausdia Timevision vs. Spyglass, Fishtail, Excellicon for SDC

THE GENUS-SPYGLASS GOTCHA

We are updating our 7nm Genus-Innovus flow to do Multi-Mode/Multi-Corner
(MMMC) timing constraints along with SOCV delay variation for improved
timing accuracy.  Genus Physical supports both, but we have run into some
trouble with constraint debug.
          
In our early design phase we see a lot of churn with RTL and SDC timing
constraints.  We are in a hurry to see the PPA results through physical
synthesis and expect that if there is a basic problem with the constraints
Genus would give a clear alert at elaboration so we know how to fix this.
Genus isn't cutting it right now.  We spent a lot of time trying to figure
out why there are big timing differences run-to-run only to find it's a
simple constraint error culprit.  Our workaround is we must push everything
(both RTL and SDC) through the Synopsys Spyglass linter every time before
going into Genus.  Having to go through any other tool is a pain.  CDNS R&D
needs to port the Tempus checks into Genus Physical.

    - from 7nm user swaps out DC-Graphical for Genus-RTL


From: [ Sam Appleton of Ausdia ]

Hi, John,

Synopsys Spyglass Constraints is older, very noisy technology now that grew
out of the market-leading RTL lint product of Atrenta.  A lot of SoC
designers reject it because of noise issues, and from our competitive
experience has a lot of issues loading very large blocks and the associated
runtime hit.  That's in addition to needing to generate SGDC from the SDC
and hoping the translation matched the original SDC.

Fishtail checks timing exceptions and clock waveforms, but their long
runtimes make a real "checking" loop with synthesis hard to justify.

Real Intent claims to have an SDC checking product, but we heard from some
of our customers that they had abandoned it.

Excellion ConMan is a newer generation tool that seems to be more focused
on constraint generation.  Their ConCert tool seems to have check/lint
capabilities but competitive data is hard to come by.

If you're going to synthesize large RTL blocks in either Genus-RTL or DCG,
you need to do fast and deep checks on what you're feeding into synthesis.
Specifically, your "pre-synthesis checker" needs to have:

  1. super-fast runtimes.  Checking the inputs must take a LOT
     less -- ideally, an order of magnitude less -- than the actual
     synthesis time.  (Why take 6 hours to lint a block that takes
     3 hours to synthesize?);
  2. directly reading the SDC input to the synthesis tool, and
     debugging/reporting that reference to the actual input;
  3. a rich set of checks that can be turned on & off as needed;
  4. waivable checks, with as minimal an amount of analysis
     noise as possible;
  5. has to directly correlate with the timing analysis of both
     your synthesis and PnR tools, to ensure anything reported
     matches timing analysis.

All my rivals fall down on one or more of these requirements.  But I'm very
happy to report that my Ausdia Timevision SdcCheck tool has been doing these
checks on RTL for many years now.

SdcCheck incorporates more than 200 checks, and like its name implies, does
*both* the linting of your SDC and checking it's intent.  It supports MMMC
constraints, Verilog/SystemVerilog/VHDL, and IEEE P.1735 encrypted RTL.
It also includes precise file/line backtracking so you can directly
pinpoint in your source RTL any SDC issues. 

        ----    ----    ----    ----    ----    ----    ----

Rather than just make empty tool marketing claims, I'll share with you two
of the more interesting checks we've discovered with SdcCheck from doing MM
constraints with customers.

STACKED CLOCKS CHECK

This involves creating multiple generated clocks while declaring multi-mode
behavior with a twist:
   create_clock -name clk

   create_generated_clock -name cdiv2 -source clk -divide_by 1
                          -master clk [get_pins b1/Z]

   create_generated_clock -name cdiv10 -source b1/Z -divide_by 10
                          -master cdiv2 [get_pins b1/Z]

In this case, the engineer tried to reduce the number of clocks by declaring
"cdiv10" with respect to another clock "cdiv2" on the same point.  What's
interesting about this is that some tools will not report this as an error
nor as failed clocks -- BUT they will not trace the latency path from the
"clk" source through "cdiv2" and onto your "cdiv10" source -- making all
your setup & hold computations on clock "cdiv10" useless.

When looking at a basic report in a timing analyzer, everything will seem
normal at first:

   Node                         Increment     Path
   ----------------------------------------------------
   clock clk (rising edge)        0.00        0.00
   clock source latency           0.15        0.15
   ff1_reg/CP                     0.61        0.76
   ff2_reg/D                      0.18        0.94
   data arrival time                          0.94

   clock cdiv10 (rising edge)     5.00        5.00
   clock network delay            0.30        5.30
   ff2_reg/CP                                 5.30 
   setup required time           -0.15        5.15
   -----------------------------------------------------
   data required time                         5.15
   data arrival time                         -0.94
   -----------------------------------------------------
   slack (PASS)                               4.21

Both clocks appear to have some network delay, which is good.  If you looked
at hold time, things would seem a little worse, but fixable.

However, if we expand the report to show the full clock path, the issue will
be obvious:

   Node                         Increment     Path
   ----------------------------------------------------
   clock clk (rising edge)        0.00        0.00
   clock source latency           0.15        0.15
   b0/Z                           0.30        0.45
   b1/Z                           0.30        0.75
   ff1_reg/CP                     0.01        0.76
   ff2_reg/D                      0.18        0.94
   data arrival time                          0.94

   clock clk (rising edge)        0.00        0.00
   clock source latency           0.15        0.15
   b0/Z                           0.00        0.15 <-- latency is missing!
   b1/Z                           0.00        0.15 <-- latency is missing!
   clock cdiv2 (rising edge)      0.00        0.15
   clock cdiv10 (rising edge)     0.00        0.15
   b2/Z                           0.10        0.25
   b3/Z                           0.04        0.29
   ff2_reg/CP                     0.01        5.30 
   setup required time           -0.15        5.15
   -----------------------------------------------------
   data required time                         5.15
   data arrival time                         -0.94
   -----------------------------------------------------
   slack (PASS)                               4.21

One way to check for this is to do a full path timing report involving every
generated clock in the design, and make sure BY EYE that the numbers look
rational and non-zero.  This is a pretty labourious task and needs to be
repeated everytime the timing constraints change.  The better way is to
check your SDC constraints that cause these issues and correct the issue at
the source.
Like panning for gold, you'll only catch this error if you happen to see it
while randomly checking your STA reports with the full clock path shown.

Getting your set_clock_group constraints in MM is also tricky, especially
because the tendency is to declare a lot more clocks than a traditional
single-mode analysis. 

We've seen blocks with upwards of 200-300 clocks, and the number of
interactions to "get right" in set_clock_groups just explodes. 

        ----    ----    ----    ----    ----    ----    ----

CLOCK CONFUSION CHECK

Here's an interesting case where we saved the customer a silicon spin with
this check.

(click on pic to enlarge image)
In this case, the customer had 100s of set_clock_group commands but had no
way to verify them.  In this case, Timevision SdcCheck flagged an incorrect
clock group (see pic above) between clocks "clk" and "xmpulse4_clk".

The designer had confused himself with a large number of clocks and had
forgotten that the master waveform of "xmpulse4_clk" is actually "clk", and
not "xclk" - confusing himself by his own naming convention.

These problems are exceedingly common in any large subsystem block or full
chip.  Our SdcCheck tool also flags the absence of correct set_clock_groups
as a problem -- allowing the designer to rectify the issue before seeing
the bad PPA result from RTL synthesis.

        ----    ----    ----    ----    ----    ----    ----

These were just two examples of real silicon issues that Timevision SdcCheck
caught.

It also checks for SDC legality and correctness (like unmatched wildcards),
as well for designer intent matching the SDC -- it asks "is everything all
aligned to the same timing analysis graph that the synthesis and STA tools
use internally?"  This approach cuts analysis noise significantly.

On runtime, Timevision SdcCheck is super fast.  Here's two examples

                            Elaborated
                #Clocks      Instances    Runtime      Memory
                -------      ---------   ---------    --------
       BlockA       8         1.1 M        3.5 min      4.0 G
   MegaBlockB     278        31.6 M       56.3 min     42.1 G

At roughly 2 min per million elaborated instances, SdcCheck is blazingly
fast.  Ok, so elaborated instances doesn't quite line up with your actual
synthesized instance count.  It's a quick elaboration of your RTL to aid
in analysis -- and it does give you a good idea of block size.

        ----    ----    ----    ----    ----    ----    ----

We're hearing that 7nm closure is going to be a real bear, and [ Godzilla ]
is getting it right by doing MMMC constraints for closure during his Cadence
Genus RTL synthesis runs.  With the headaches of patterning and coloring at
7nm, even the traditional ECO flow is going to face large headwinds.

He getting his blocks as "buttoned up" as possible *before* any synthesis or
PnR timing ECOs is not just smart -- it's critical.

And I just wanted your readers to know they can use Timevision SdcCheck as
a less noisy & faster alternative to Spyglass Constraints if they want.  :)

    - Sam Appleton
      Ausdia                                     Sunnyvale, CA

        ----    ----    ----    ----    ----    ----    ----
   Sam Appleton founded Ausdia in 2007 after stints at Azul Systems, Reshape, and SGI, having graduated with a PhD from the Univ. of Adelaide, Australia. In his downtime, Sam tries to re-learn rock guitar and coach kids baseball.
Related Articles

        ----    ----    ----    ----    ----    ----   ----

Related Articles

    A user details 5 major differences between Fishtail and Excellicon
    Fishtail CEO balks at "messed up" Fishtail vs. Excellicon letter

Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.












Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2025 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)