Solido's new brainiac, Jeff Dyck, calls out CDNS HSMC attack blog

( ESNUG 562 Item 5 ) -------------------------------------------- [09/08/16]

Subject: Solido's new brainiac, Jeff Dyck, calls out CDNS HSMC attack blog

... blah blah blah blah blah (Solido's) HSMC vs. our (Cadence Virtuoso) Scaled-Sigma Sampling (SSS) blah blah blah...

- TeamADE Cadence (Cadence.com 08/18/2016)


From: [ Jeff Dyck of Solido Design ]

Hi, John,

Recently the marketing folks at CDNS posted a thinly veiled attack blog
against Solido's High Sigma Monte Carlo (HSMC) tool.  In it, Cadence's
"TeamADE" makes 6 factually incorrect claims about our tool.

So, to set the record straight for the EDA user community, I want to correct
these CDNS fallacies in the DeepChip forum, aka EDA's "Switzerland".

        ----    ----    ----    ----    ----    ----    ----

Cadence Fallacy #1:

"The accuracy of the HSMC method strongly depends on the accuracy of the underlying response surface model."

The actual Solido fact:

HSMC delivers perfect Monte Carlo and SPICE accuracy using a partially accurate sample sorting model, rather than requiring a perfectly accurate response surface model.


Cadence's claim about our response surface models is wrong.  HSMC works by: 

    - Selecting a set of initial samples that make a strong basis for
      a high-sigma model and simulating those in SPICE
    - Building a sample sorting model based on the initial sampling data
    - Generating millions or billions of real Monte Carlo samples
    - Approximately sorting the Monte Carlo samples in output space
    - Simulating the high-sigma distribution's tail using the user's
      SPICE simulator

    A standard issue 6T SRAM bitcell; a classic high-sigma problem
    due to high replication

To measure cell current on a SRAM bit cell to 6-sigma, Solido HSMC would:

  1. Generate (but not simulate) 10 billion Monte Carlo samples
  2. Simulate in SPICE 1000 initial samples that are farthest from
     nominal in input space, as these create a good basis for a 
     high-sigma sorting model for a continuous output like cell current
  3. Create a high-sigma sorting model for sorting in cell current space
  4. Roughly sort every one of the 10 billion samples in cell current
     space, from worst-case (lowest) cell current to best case (highest)
     cell current
  5. SPICE the 250 worst case samples as predicted by the sorting model
  6. Rebuild the sorting model, incorporating the 250 worst-case samples
     to add more resolution in the tail
  7. Re-sort all 10 billion samples in cell current space w/ updated model
  8. SPICE the next 250 worst case samples predicted by the updated model
  9. Stop, as HSMC would have recovered the most extreme cell current
     samples (where all the 6-sigma action is) from the population of
     10 billion, with perfect Monte Carlo and SPICE accuracy.

This job would take 1500 simulations, about 2-3 minutes of algorithmic
runtime, and around 5 minutes of simulation time using 15 SPICE simulators
in parallel.

The end result would be that HSMC ran the worst-cases from a population of
10 billion Monte Carlo samples in SPICE, recovering the exact cell current
tail, with perfect accuracy -- getting the exact same result as if you were
to run all 10 billion Monte Carlo samples in SPICE -- and it does this in
only 1500 SPICE simulations instead of 10 billion SPICE simulations.

Cadence is wrong.  HSMC does not predict output values using a model.

At no point does Solido HSMC generate a response surface model responsible
for predicting output values.  That is too hard of a problem, and as Cadence
points out, it would not work very well.  HSMC does not do this.  Cadence is
wrong to suggest that HSMC must solve this much harder problem in order to
be accurate.

Sorting models are much easier to make (compared to response surface models
that predict precise output values.)  A sorting model just needs to figure
out the order of samples in output space.

And HSMC does not even need a perfect sorting model to get a perfect Monte
Carlo and SPICE accurate high-sigma result:

    - Sample order can be off by 50's or even 100's of positions
    - HSMC simply powers through the sorting error
    - This recovers all of the samples in the high-sigma tail
    - The result is perfect Monte Carlo and SPICE accuracy

    HSMC's sorting model can be off by 50's to 100's of positions
    and still recover the entire high-sigma tail

A poor initial sorting model can be detected & corrected by HSMC at runtime.
Because HSMC simulates according to its predicted sample order in SPICE,
it can see when it is off, and if necessary add more samples to the model
in the high-sigma tail, then use those samples to correct the accuracy of
the sorting model.  This adds resilience.  HSMC knows when it is right and
knows when it may not be -- and then fixes itself accordingly.

        ----    ----    ----    ----    ----    ----    ----

HSMC Verifiability

During its runs, HSMC reveals how its sorting model compares with its SPICE-
verified results, and shows this result clearly to the designer.  It shows
its accuracy using a plot with predicted sample order on the x-axis vs. the
SPICE simulation value for each sample on the y-axis.

Since HSMC is simulating from the worst-case values inward, a generally
monotonic (either increasing or decreasing) curve shows that it is working
correctly.  In the case that the spec is a minimum value, like the SRAM bit
cell current example above, the plot should show a monotonically increasing
function.  In the case of a maximum spec, the plot should show a
monotonically decreasing function.

    HSMC verification plots - perfect, typical, and broken

In the verification plots, you can see:

    - The theoretically perfect case on the left shows what would happen
      if samples were perfectly sorted - which never happens in practice,
      and is not required for HSMC to deliver perfect Monte Carlo and
      SPICE accuracy.

    - The plot in the middle shows a monotonically increasing plot with
      some sorting noise - which is typical, and produces a perfect Monte
      Carlo and SPICE accurate result.

    - The plot on the right shows what HSMC looks like if it's not working
      correctly and something is wrong.  The sorted order and the SPICE
      answer do not correlate at all, and we see noise, not a monotonically
      increasing plot

The HSMC verification plot for the earlier 6T SRAM bit cell current:

Notice (with a small amount of noise) the monotonically increasing trend.
By the time HSMC has run 500 samples, it has recovered the worst case ~400
out of the population of 10 billion.  This gives perfect 6 sigma Monte Carlo
and SPICE accuracy in the worst case bit cell current tail.

Also, as I said earlier, HSMC automatically detects ordering problems and
reports them to the designer -- so it never reports a bad answer.

So, yes, John, despite what TeamADE Cadence says in their blog, Solido HSMC
delivers 100% Monte Carlo and SPICE accuracy.  Not 99%.  Not 99.999%.  100%
perfect match between what brute force SPICE reports and what Solido HSMC
reports.  Zero difference.

        ----    ----    ----    ----    ----    ----    ----

Cadence Fallacy #2:

"For HSMC, the number of samples to build response surface model and the cost of handling a large number of parameters will significantly increase with number of devices."

The actual Solido fact:

HSMC's SPICE simulation count does not depend on the number of parameters or the number of devices.


Solido HSMC can handle designs with over 100K process variables with just
a few 1,000 samples to build its model.  In fact, the HSMC algorithm
works identically for designs with 10 variables as it does for designs
with 100K variables. 

To describe how HSMC can do this, we can look at its two stages:

Stage 1 - Initial sampling: HSMC's initial samples are selected & simulated
in SPICE.  Each sample is assigned a random value drawn for each parameter,
such that we get equal coverage of each variable, independent of the number
of devices.

For example, if we have three CMOS devices, M1, M2, and M3, each of which
has a delta width (dw) parameter on their channel width, and we draw a
sample, what happens is:

    - A random value is drawn for M1_dw (e.g. -1.7)
    - A random value is drawn for M2_dw (e.g. 2.3)
    - A random value is drawn for M3_dw (e.g. -2.2)

All three random channel delta widths are combined into a single sample:

                [ M1_dw = -1.7, M2_dw = 2.3, M3_dw = -2.2 ]

That single sample, with a value for all 3 parameters, is simulated in one
single SPICE simulation.

As we add more devices (e.g. M4, M5, ..., M1000) -- or more input parameters
per device (e.g. dtox, dl) -- we just get more parameters per sample, which
are all simulated in a single SPICE simulation.  This is why the number of
samples or simulations do not depend on the devices or parameter counts.

    Solido HSMC's initial sampling generates a random value for each
    input variable for every sample.  The SPICE simulation count stays
    constant with more dimensions.

Stage 2 - Sorted sampling: HSMC simulates ordered samples from the tail in.
For the same reasons as above, in this stage, the number of sorted samples
that needs to be simulated in SPICE does not change with number of devices.

This is why the SPICE simulation count does not depend on the number of
parameters or the number of devices.

Present day production foundry models can have 1 to 12 process variables
per device, depending on how it is modeled.  Solido HSMC capacity today is
at least 100K process variables, giving our HSMC a capacity of at least
8K-100K devices.  So, no John, Solido HSMC does not depend on the number of
devices or the number of parameters, despite what Cadence says.

        ----    ----    ----    ----    ----    ----    ----

Cadence Fallacy #3:

"For HSMC, it is extremely difficult to build a response surface model when an output is a very complicated nonlinear function subject to a large number of statistical variables."

The actual Solido fact:

HSMC is accurate on complicated non-linear circuits with 100K+ statistical variables.


As described in my earlier answer to Cadence Fallacy #1 above, HSMC gets
Monte Carlo and SPICE accuracy using a partially accurate sorting model
rather than requiring a perfectly accurate response surface model.  HSMC's
sorting model is designed to capture all kinds of things that are hard to
model that happen in real circuits, like non-linearities, multi-modal
distributions, binary outputs, and high-order interactions.

We have 100's of test cases:

  - A mix of customer "problem child" cases with behaviours that are
    hard to model that we've collected from 7 years of production work.
  - Home brewed R&D circuits we have built to be deliberately hard.
  - Dozens of mathematical functional problems that concisely capture
    things that are hard to model.

Overall, this makes an HSMC regression suite of 144 tests that we run every
45 minutes to find gotchas.  We have a bunch more tests that we run nightly,
and even more that we run with each new rev of Solido HSMC.

Error Detection is an HSMC Strength

Since HSMC can detect sorting model problems at runtime, our customers can
see when sorting problems occur as they occur.  This prevents chips from
taping out with the wrong answer -- plus it lets customers quickly report
any HSMC problems to us.  This lets us find HSMC's real limitations quickly,
and to extend HSMC to prevent them.

As a result, Solido HSMC can handle multi-modal and binary distributions.

    (On The Left) high-sigma multi-modal distributions will have millions
    or billions of samples centered around a main mode and a much smaller
    number of samples centered around a rarer failure mode.

    (On The Right) high-sigma binary distributions are similar in that
    there are millions or billions of "passes" and a rare set of
    "failures", but instead of having continuous functions, we
    have just two values -- a pass and fail -- e.g. whether a bit cell
    writes or not.

Solido HSMC can handle non-linear circuit functions, too.

    An example non-linear statistical mapping is where the
    input distribution (e.g. M7_dw) has a different shape from
    the output distribution (e.g. delay).

And Solido HSMC can handle non-additive interaction effects, too.

    For variation analysis, statistical interaction effects can happen
    when two or more input variables (e.g. M1_dtox, M2_dtox) have a
    non-additive effect on a device's output (e.g. delay).  In this pic
    above, both M1_dtox and M2_dtox have a simple, linear mapping to
    delay -- but when sampled together -- they interact to have a
    complex bimodal response.

After 7 years of this very quick customer/R&D feedback loop for HSMC, now
when our customers see sorting model issues, they are almost always due
to the noise in the SPICE simulation (e.g. an inaccurate bisection search
or measures being triggered on the next waveform) being greater than the
effects of process variation.  For example, if you do a temperature sweep
from -40 to 125, the SPICE output values are scattered all over the place,
and any errors encountered are not due to HSMC's approach.  (And BTW, HSMC
also helps designers to find these noise problems, too.)

So, yes John, Solido HSMC can handle complex, non-linear functions with lots
of parameters (100K+!), despite what Cadence marketing says.

        ----    ----    ----    ----    ----    ----    ----

Cadence Fallacy #4:

"Having a large number of specifications that need to be tested can overwhelm the HSMC method."

The actual Solido fact:

HSMC can handle circuits with 50 specs, 5 of which are targeted for high-sigma, in under 10K SPICE simulations.


That is, HSMC handles production designs with a high number of production
specs and within production SPICE simulation count budgets.

For example, a circuit with 50 specs, of which 5 specs are targeted for
high-sigma, runs in only 1,000 + 5*(500 to 2000) = 3,500 to 11,000 SPICE
simulations in HSMC.

Here's how HSMC scales with the number of specs:

    - HSMC's initial samples (e.g. 1000) are constant and do not go up
      with the number of specs.

    - HSMC's sorted sampling uses 500-2000 SPICE simulations-per-spec
      that is being targeted for high-sigma; this scales linearly.
      This also finds the high-sigma tail where all the action is.

    - No additional SPICE simulations are required for specs not
      being targeted for high-sigma.

This turns out to be a very efficient scaling scheme for production designs.
Roughly 90-95% of high-sigma jobs that we see for production circuits will
target a single spec.  We occasionally see a few jobs that target 2-5 specs.
We sometimes see more measurements, but it is very rare in production for
device designers to target more than 5 specs to high-sigma. 

We sometimes have customers who have 100's, or even 1000's of measurements
to be evaluated.  In these rare cases, the goal is still to target just one
(or a handful) of specs for high-sigma -- and report the other measurements
(e.g. device parametrics) with each high-sigma sample.  HSMC does this
without any additional SPICE simulations or runtime.

So, Cadence is wrong again.  Solido HSMC does not get overwhelmed by even
50 specs (of which 5 are targeted for high-sigma) in practice.

        ----    ----    ----    ----    ----    ----    ----

Cadence Fallacy #5:

"For HSMC the problem of handling huge amounts of data becomes very difficult when the yield is near 6 sigma."

The actual Solido fact:

HSMC handles up to 7-sigma.


Keep in mind that 7-sigma is two orders of magnitude more than 6-sigma, and
more than memory designers need to verify even for their largest memories.
For example, a 1 Gb SRAM memory only requires a 6.5-sigma bit cell to get
a 95% product yield.

In standard brute-force Monte Carlo analysis, the number of samples needed
scales exponentially with the sigma you wish to reach.  It is possible to
use brute-force Monte Carlo for 3-sigma, and maybe 3.5-sigma, but 4-sigma
and beyond requires 100's of thousands of samples to verify:

Getting high-sigma data on circuit designs with Monte Carlo accuracy creates
an exponential data explosion problem:

          Sigma                   Brute Force Monte Carlo Simulations
         4 sigma                               700,000 SPICE runs
         5 sigma                            50,000,000 SPICE runs
         6 sigma                        10,000,000,000 SPICE runs
         7 sigma                     6,000,000,000,000 SPICE runs

Our HSMC verifies 5 to 7 sigma designs in minutes to hours, with exactly the
same accuracy as brute force Monte Carlo and SPICE.  HSMC's runtime depends
mostly on parallelized SPICE simulation runtime, not algorithmic runtime,
and typically completes 6-sigma jobs with just a few thousand SPICE simulations.

For example, a 1 hour 6-sigma HSMC job would typically run in 55 minutes
wall clock simulation time, and only 5 minutes HSMC algorithmic runtime.

Typical total wall clock runtimes for Solido HSMC to get 6-sigma are:

    - Simple cells, like memory bit cells and logic cells: under 10 minutes
    - 100's device cells, like sense amps or multi-bit flops: under 1 hour
    - Bigger designs, like memory slices and small macros: in 2-8 hours

To reiterate, HSMC's wall clock runtime is dominated by the SPICE simulation
times and the number of CPUs used to run the simulations in parallel -- not
by HSMC's significantly lesser algorithmic overhead.

Some recent Solido HSMC user data:

So, yet again, John, Cadence's claim that HSMC has a hard time with 6-sigma
is not true at all.  We handle up to 7-sigma in just minutes to hours.

        ----    ----    ----    ----    ----    ----    ----

Cadence Fallacy #6:

"For example, for a billion Monte Carlo samples with 1,000 variables, and each variable requires 8 bytes to store as a double value, the total amount of data is 8 Terabytes... "

The actual Solido fact:

For a trillion Monte Carlo samples with 100,000 variables, HSMC's total memory and storage use is less than 8 Gigabytes.


Cadence is off by at least 1000X in their claim.

HSMC software runs using under 8 Gigabytes of RAM.  Here's how it works:

    - HSMC generates what it needs on the fly and only keeps what it is
      working on in machine memory.

    - HSMC does not store the entire device sampling space as Cadence
      claims; that would be highly inefficient.

    - HSMC cleans up your machine disk as it goes, so its disk storage
      is only whatever is being used for the current SPICE simulations
      in progress at that moment.

This is how Solido HSMC can solve cases with 1 trillion samples and 100k
variables using only about 8GB of RAM. 

This is 1000X less storage than what TeamADE Cadence claims, even for 100X
more variables and 1000X more samples.

        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----

WHAT CUSTOM IC DESIGNERS WANT

In the 11 years Solido has been in business, with customers at 40 different
companies, doing high-sigma variation analysis on over 1,000 production
devices, we have found that circuit designers want:

  1. SPICE Simulator Compatibility

     In our experience, the top high-sigma analysis tool users in the
     world are memory and standard cell designers.  These designers
     use a mix of SPICE simulators primarily from Synopsys (HSPICE,
     FineSim, CustomSim), but also from Cadence (Spectre, XPS), and
     Mentor (BDA AFS, AFS Mega), and Keysight (GoldenGate) -- as well
     as internal SPICE tools.  (See ESNUG 561 #5.)

     We assume the Cadence Virtuoso ADE Scaled-Sigma Sampling (SSS)
     works with their own Spectre & XPS -- but since it's new SW -- we
     are not sure if SSS can support other competing SPICE simulators
     like BDA AFS, AFS Mega, GoldenGate, HSPICE, FineSim, or CustomSim.

     I'm happy to report that in our 11 years of business, Solido has
     always been SPICE agnostic.  We support all of these commercial
     simulators -- plus a good number of the internal SPICE simulators
     our customers have developed in house.

  2. CLI Script Batch Mode Support

     We've found that most memory and standard cell designers depend
     heavily on CLI script-driven flows.  They like to write complex
     command line scripts that run and/or post-process big batches of
     high-sigma analyses using csh, bash, Python, Perl, and C++.

     Cadence SSS appears to be part of ADE and we are not sure how
     (or if) it works an external CLI batch mode.

     For years, Solido HSMC has been optimized to work with these
     external csh/bash/Python/Perl/C++ command line flows.

  3. High-Sigma Analysis That's Fast

     The bulk of engineers we speak with want their production high-
     sigma jobs to finish in minutes to hours.  Overnight runtimes
     are only tolerated for big jobs (e.g. small memory macros or
     memory slices).

     We are not sure how fast Cadence SSS solves the high-sigma
     analysis problem on simple, or typical, or large circuit designs
     with varying levels of complexity.

     We do know that Solido HSMC's runtimes are fast.  It runs a
     default of 4K SPICE simulations -- and it can solve simple stuff
     like bit cells in 1K SPICE simulations.  As I explained in Cadence
     Fallacy #4 above, for HSMC the wall clock runtime is mostly just
     the parallelized SPICE runtimes -- which is uber fast.

     For example, it can do a DRAM column to 6-sigma on 10 BDA AFS
     licenses in an hour.  Give it 100 BDA AFS licenses and it can do
     6-sigma on that sense amp in just 5 minutes.  Or 7-sigma on an
     SRAM bit cell in just 5 hours.

  4. High-Sigma Analysis That Scales

     High-sigma jobs must run on all kinds of circuits -- memory slices,
     complex standard cells, I/Os, and analog/RF designs.

     HSMC has handled 50,000 device memory slice with 100K+ process
     variables still running in hours.

     We are not sure how well Cadence SSS scales in practice to handle
     large memories, I/Os, standard cells, or analog/RF blocks.

  5. High-Sigma Analysis That's Accurate

     Accuracy rules all.  There's simply too much at stake both in
     engineering man-hours and device fabrication yields to get a
     wrong sigma analysis answer.  It can easily cost your company
     millions to 10's of millions of $$$ if your sigma analysis
     is wrong.

     Accuracy also means handling all kinds of hard stuff that happens in
     real production cases, like non-linearities, multi-modalities, high-
     order interactions, large-scale additive effects, and binary outputs.

     As I outlined in Cadence Fallacy #1 above Solido HSMC is 100%
     accurate compared to brute-force Monte Carlo and SPICE.  Not 99%.
     Not 99.999%.  100% perfect match between what brute-force SPICE
     reports and what Solido HSMC reports.  Zero difference.

     We know this because over the past 7 years of doing detailed new
     evals for 40 different customers over a boatload of new process
     nodes -- our HSMC analysis comes out right, again and again.

     In comparision, we are not sure where Cadence SSS is accurate
     or where it is not -- nor how many unknown accuracy challenges
     remain to be found for SSS.

  6. High-Sigma Analysis That's Verifiable

     And finally in my answer to Cadence Fallacy #1 above, I detailed
     how -- through its initial sorting model -- how Solido HSMC knows
     during runtime when it's right, and knows when it may be off -- and
     it then fixes itself accordingly.

     Our HSMC also keeps the user informed of any such issues (and any
     corrections it did) during runtime.  No surprises.

     We are not sure how (or if) the Cadence SSS software proves that
     it gets the right answer, or how (or if) it stops the designer from
     making mistakes when it is wrong -- nor how Cadence R&D learns about
     stuff that SSS can't handle and to correct it. 

        ----    ----    ----    ----    ----    ----    ----

A NEW SOLIDO TOOL FOR MEMORY DESIGNERS

Since we're discussing Solido tools, I want to announce a new tool we have
for memory designers: Solido Hierarchical Monte Carlo (HMC).  You feed HMC
a memory slice (or memory critical path) and HMC uses it to statistically
reconstruct the full on-chip memory.  HMC's results are statistically
equivalent to what you would get if you were able to run brute-force Monte
Carlo on the entire on-chip memory (not just a single macro).  Of course,
it would be impossible to simulate all on-chip memory due to scale, but HMC
can produce a statistically equivalent result in just a few 1000 SPICE
simulations on the slice or critical path.

We have not seen anyone else who can do this.

        ----    ----    ----    ----    ----    ----    ----

Thank you for letting me write this, John.  I hope it corrects the record
for the user community on how Solido High-Sigma Monte Carlo actually works.

I also hope Cadence will issue an official correction to their false claims
and that they stop spreading misinformation about HSMC.

    - Jeff Dyck
      Solido Design                              Saskatoon, Canada

        ----    ----    ----    ----    ----    ----    ----

Jeff Dyck was on the original Solido team in 2006, and is one of two "fathers" of HSMC in 2008. Jeff knows an awful lot about SPICE, transistors, custom design, and on-chip variation.

Related Articles:

    Mentor BDA AnalogFastSpice and Solido were #4 tools at DAC'15
    Solido brainiac Trent on user comments, and MunEDA WCD questions
    Solido HSMC and Synopsys HSPICE to design a CMOS memory sense amp
    NVidia on Solido Variation Designer with SNPS HSPICE for 6-sigma

        ----    ----    ----    ----    ----    ----    ----

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)