( ESNUG 494 Item 5 ) -------------------------------------------- [10/20/11]
From: [ The Green Lantern ]
Subject: Juniper Networks benchmarks SNPS CustomSim XA vs. Synopsys HSPICE
Hi John,
Please keep me anon.
I went to the Synopsys "SPICE Up Your Chip" AMS dinner event at DAC this
year, which I estimate about 500 people attended. Nikhil Jayakumar from
Juniper Networks gave a presentation on using Synopsys CustomSim in the
construction of hybrid tree-mesh clock distribution networks, and compared
its performance and accuracy with Synopsys HSPICE.
Clock Tree vs. Clock Grid
Nikhil indicated that Juniper uses target skew to decide whether to use a
clock tree or clock grid structure when designing chips. He said there are
two kinds of skew:
1. Structural (layout) skew, caused by capacitive load mismatch and
wire length mismatch. This can be handled by building balanced
clock trees (eg. H-trees) which can get zero skew, but only in the
absence of PVT variations. Can be also measured using regular
Static Timing Analysis (STA) tools.
2. Dynamic skew due to PVT variations. There are dynamic clock
deskewing schemes which Intel uses in their clock networks.
Alternatively, there are static schemes such as adding cross-links
to clock mesh or hybrid tree-mesh structures. Static schemes
requires SPICE-based analysis; PrimeTime techniques don't work.
Juniper did a tape-out with a hybrid tree-mesh clock distribution network
that had a frequency of 700 MHz to 800 MHz. The design was implemented in
a TSMC 40 nm process.
They used a hybrid tree-mesh structure (the tree drives a mesh), where the
clock delivered from PLL to vertical clock spine which drives 6 horizontal
clock ribs, which drives a clock mesh. They added cross-links to the
hybrid tree-mesh structure at regular intervals to reduce skew due to PVT
variation. They then carefully constructed a vertical clock spine and
horizontal clock ribs that were balanced, with low latency to reduce jitter.
Nikhil listed the following factors as needing to be managed:
- Optimal wire width.
- Spacing.
- Buffer drive strength.
- Wire length between buffers chosen to reduce jitter.
- Slew limitations.
- IR and EM factors for determining buffer drive strength and
tolerable slews.
- Routability and area constraints.
- Adding cross-links (shorting wires) in the tree to cancel out PVT
variation. (details below)
Cross-Links to Cancel PVT Variation Complicate Timing:
Adding cross-links at regular intervals in the clock tree to cancel out PVT
variation cannot be done randomly because in some cases adding cross-links
worsen jitter due to the added load on buffer. So Juniper only adds
cross-links if skew reduction is outweighed by jitter increase.
Juniper needed SPICE simulation to estimate the delays, since PrimeTime
analysis could not handle reconvergence in non-linear circuits nor account
for the averaging effect of cross-links. So they used Synopsys CustomSim XA
based on its:
- SPICE accuracy. Relative accuracy is important since you are only
concerned with relative differences between various points in the
network.
- Speed. The clock grid is tuned multiple times during design, so
Juniper needed quick turnaround for ECOs.
Synopsys CustomSim XA vs. Synopsys HSPICE:
Nikhil indicated that CustomSim XA is easy to use - there was no learning
curve, because CustomSim used the same HSPICE netlist and measures.
He gave the following sample CustomSim XA command syntax:
xa -hspice <netlist.sp> -o <output_file> -c <command_script_file>
Where command_script_file is used to specify level of accuracy:
set_sim_level <level from 3 to 7>
Nikhil then showed his test case:
Element # of Elements
diodes 2,275
NMOS 186,564
PMOS 186,564
capacitors 7,335,052
resistors 4,306,327
voltage sources 3
---------------- ----------
Total 12,016,785
The test case was an extracted netlist of the clock tree and mesh. Juniper
used Synopsys Star-RC ver 2009.12 with "reduction" enabled for extraction.
Used CustomSim XA (64-bit, Ver 2010.03) vs. HSPICE (64-bit, Ver 2009.09):
CustomSim XA CustomSim XA
HSPICE Level 6 Level 3
Latency (ps) 1317.2-1346.9 1313.03-1342.76 1326.19-1353.48
Skew of delay (ps) 29.7 29.73 29.27
Slew (ps) 109.62-117.77 109.952-118.083 106.797-117.454
Skew of slews (ps) 8.15 8.131 10.657
Runtime (cpu_clock) 3.5 hrs 1.5 hrs 1 hr
Memory 8 G 4 G (physical)+ 4 G (physical)+
9 G (virtual) 9 G (virtual)
CustomSim XA's:
- Latency at Level 6 accuracy was generally between 0.4% to 0.5%
error vs. HSPICE. Its latency at Level 3 accuracy was generally
between 0.5% to 1.8% error.
- Skew was quite accurate, even at Level 3 setting, and its slew also
fairly accurate.
- Simulation performance was 2x to 4x improvement over HSPICE.
HSPICE was *unable* to run a high-capacity test of 101,501,922 elements. In
contrast, the same high-capacity test ran successfully in CustomSim XA, as
it didn't have memory limitations of HSPICE.
Nikhil's concerns and wishlist were:
1. Electrical current measurements were not accurate in CustomSim XA ver
2010.03. Problem fixed in ver 2010.12 so accuracy now within 3%.
2. Would like CustomSim XA to support Monte Carlo simulations. Juniper
uses cross-links to reduce PVT skew, but they need to do Monte Carlo
simulations on a large netlist fast to know for sure. Synopsys claims
Monte Carlo support will come in 2011.09.
3. Nikhil also said that a faster simulation time is always welcome.
I hope these reports help your readers, John.
- [ The Green Lantern ]
Join
Index
Next->Item
|
|