( ESNUG 420 Item 1 ) -------------------------------------------- [10/22/03]
From: Jatan Shah <jshah=user domain=broadcom spot calm>
Subject: User Benchmark Finds CeltIC Much More Accurate Than PrimeTime-SI
Hi John,
Here's the method we used for our evaluation where we compared CeltIC and
PrimeTime-SI against circuit simulations performed with Nassda's HSIM.
HSIM is our golden simulator. We used HSIM in a mode that was one notch
less accurate than the analog simulation mode: hsimspeed = 1, hsimcfm = 2.
- Our simulation was performed on a per net basis. We picked 44 nets from
the top-level of one of our chips. Top-level nets tend to be long and
strongly driven making them potential aggressors as well as victims.
- I did not use the SPICE decks that were generated by either CeltIC nor
PrimeTime-SI for my sim because I wanted to verify that both tools were
indeed simulating the right network. Additionally, the SPICE files
generated from tools have a specific alignment of aggressors that the
tool itself uses to compute the delta delay. My goal was to verify
the correctness of this alignment.
- All simulations were done assuming infinite timing windows. That is,
the possibility of any particular aggressor not overlapping with the
victim window was set to 0 -- i.e. all aggressors were expected to
overlap with the victim window.
- For every victim net four delta delay numbers were calculated:
max_rise (victim rising_edge, aggressors falling_edge)
max_fall (victim falling_edge, aggressors rising_edge)
min_rise (victim rising_edge, aggressors rising_edge)
min_fall (victim falling_edge, aggressors falling_edge)
- After I chose the victim nets, I chose a single victim driver receiver
pair per net for the purposes of the simulation. The stage delay was
defined as the delay from the input of the victim driver to the input
of the victim receiver. It is to this delay that the delta delay was
either added or subtracted depending on whether it was the min or the
max delay that was being computed.
- All aggressors for the above nets were taken into consideration. That
is, no aggressors were filtered in Celtic/PrimeTime-SI nor in the SPICE
simulations. The only aggressors that were ultimately filtered were
the ones that produced absolutely no impact on the aggressor within the
accuracy of the Nassda HSIM simulator.
- Finding the worst-case delta delay for the above four cases was carried
out in three steps. The three steps are described below.
Step 1: Individual aggressor alignment. The victim is made static 1
or 0 depending on the simulation type. One simulation is carried
out per aggressor by causing a transition at the aggressor output and
determining the latency between the aggressor input switching and the
peak of the bump caused at the victim receiver. These latencies are
translated to offsets at which all the aggressor inputs are to be
stimulated so that there is close alignment of the peak of all the
bump voltages.
Step 2: After all the offsets are calculated, these offsets are used to
align the peaks of all the aggressors in the second stage, and a single
simulation is performed with all the aggressors switching (in the
appropriate direction depending on the simulation type) except for
those that have very very small bumps. (Please note that very very
small means that even HSIM in a high accuracy mode failed to record
any change at the victim receiver for that given aggressor.) The
victim is swept in relation to the aligned noise bump to get the worst
case delay. Depending on the simulation, either the maximum of the
delays (for max_rise & max_fall) or the minimum of the delay (min_rise
& min_fall) is recorded. The results of this stage are the offset
value for the switching of the input to the victim driver that causes
the worst case delay. The corresponding delay value (either the max
or the min) is also captured.
Step3: Further refinement of aggressor alignment is done. At this
point, all the offsets calculated from stage1 for aggressors and from
stage2 for victim driver are used and each aggressor is further swept
in either direction from the starting offset value to capture further
increase in max delay. Every aggressor is swept one at a time. And
after every sweep, the new offset corresponding to the aggressor that
was just swept and also corresponding to the new "max" or "min" delay
is used for the remaining aggressors that need to be swept. This is a
greedy algorithm and the reasoning here is that with a good initial
point a greedy search for the worst case delay will lead to the global
minimum or maximum. In general I see about 10% to 20% increase in the
computed worst case delta delay from stage2 to stage3, indicating that
the global maximum/minimum of the worst case delay is not very far.
- The RC network simulated contained the coupled RC network of all the
victims and the aggressors. The coupling capacitors between the victim
and the aggressors and between any two aggressors were maintained intact.
The only coupling capacitors that were grounded were the aggressor
coupling caps that were not coupling to any other aggressor net, but
some other net in the design.
Setting up the benchmark for PrimeTime-SI was a little more involved. It
has the capability of iterating several times on the delta delay numbers.
The reason for iterating is that once the delta delay numbers are computed
the timing windows need to be recomputed and the aggressors are re-filtered.
This process is expected to converge within a few iterations. The first
iteration in PrimeTime-SI is always with infinite windows. Since I was
running my benchmark with infinite windows, the correct comparison point
would be to compare against PrimeTime-SI after one iteration. However, in
the first iteration, PrimeTime-SI uses a less accurate Elmore delay model
for delay calculation for timing windows as well as a less accurate method
to compute the delta delay. It was therefore necessary to run multiple
iterations. However, running multiple iterations would typically filter
out some aggressors based on timing windows that I did not want.
I got some special Tcl code from the PrimeTime-SI development team to retain
all the aggressors even after second iteration. The method worked and the
results got significantly better but the results were still not as good as
those of CeltIC. The ease of use of PrimeTime-SI with PrimeTime was higher
since everything was built-in.
My benchmark does not look at the accuracy of either of the tools with
timing windows. In general, I believed that if the accuracy was good with
infinite windows, there should be few fundamental reasons (except for the
way timing windows are applied) why the accuracy with timing windows
should be much worse.
PrimeTime-SI's runtime was a bit quicker, though runtime was not as big an
issue since this was only the top-level of a completely hierarchical chip
that had a few thousand nets. CeltIC's runtime was longer since it was
being used in the most accurate mode that it had. The setup provided by
Cadence was to match up my simulation setup, including parameters such as
how far the sweeps should be done to find worst-case alignment, etc. While
PrimeTime-SI had several switches to control the filtering of aggressors and
the number of iterations, there really were no switches to control sweeping
or aggressor alignment.
I didn't test PrimeTime-SI logical correlation of aggressors/victims.
Benchmark Results:
CeltIC PrimeTime-SI
-------- ------------
Max Rise Mean Error: 4.9025 -23.2759
Max Rise STD Dev: 11.0508 25.3116
Max Fall Mean Error: 11.3949 -0.1695
Max Fall STD Dev: 10.6221 30.7975
Min Rise Mean Error: -6.4781 0.9596
Min Rise STD Dev: 8.2935 15.5052
Min Fall Mean Error: -6.0534 2.0522
Min Fall STD Dev: 9.7295 18.2835
Our results showed that CeltIC has better bounded accuracy than PrimeTime-SI
for crosstalk induced delta delay calculation.
The PrimeTime-SI development team told me there are several differences in
the way their tool did alignment and the way I did alignment of aggressors;
but they didn't go into detail. In general our eval of PrimeTime-SI ended
with the conclusion that a little more work was required to be done first
to compare alignments and then to identify the differences. Some inherent
assumptions either on my part or on the part of PrimeTime-SI developers may
be preventing my results from matching up.
Even the Celtic development team further refined several of their parameters
to achieve my results shown above.
After finding that CeltIC is more accurate than PrimeTime-SI, we will
continue to use CeltIC for all of our crosstalk, glitch and delta delay
analysis. We recently taped out a chip, where we used CeltIC in its highest
accuracy mode. CeltIC in this mode tended to be quite pessimistic -- often
leading to large violations that required designers careful attention in
filtering errors. What this points out is that it is still up to the design
team to use judgment in applying the tool. (Using timing windows would
significantly reduce this pessimism, but with timing windows one has to take
care that all different timing modes are covered with multiple STA runs with
different case analysis settings.)
Overall, CeltIC has been very important in helping us identify SI issues,
but we must also enhance our methodology to use it most effectively.
- Jatan Shah
Broadcom Irvine, CA
|
|