( SNUG 01 Item 27 ) -------------------------------------------- [ 3/28/01 ]
Subject: Libraries, SPICE, PathMill, Synopsys "Power Compiler", EPIC
THE SPECTRE OF LAYOFFS: One of the odd things I noticed between this year's
SNUG stats and last year's was a quiet interest in Power Compiler. Last
year 34 people attended the Power Compiler tutorial; this year 68 people
were in that class. This could be a statistical fluke or it could be that
power issues are now beginning to bother designers. I expected the Power
Compiler numbers to stay flat or go down because of all the recently
annouced layoffs in the wireless markets. (Wireless guys are the ones most
paranoid about every damn picoWatt.) But then I remembered the Intel paper
yarping about power issues...
"I attended an Intel presentation on Power Reduction In Datapath
Designs. This paper reiterated the advantages of Power Compiler
RTL clock gating and pointed out that Module Compiler doesn't have
this functionality. Intel worked with Synopsys to define clock
gating requirements for MC, and these features are now implemented
in MC 2000.11. Using test cases from their 3D graphics engine,
they were able to show a power savings of 35-57%, and an area savings
of 5-10%. These numbers are in agreement with SNUG presentations
and tutorials I've seen in the past about Power Compiler.
Anyone competing in an arena where power is important should take a
look at Power Compiler. I have not had the opportunity to use it yet,
but I've been watching it closely for some time now. Since area
reduction is my personal crusade, the area savings from RTL clock
gating has grabbed my attention. Basically, a bank of clock enabled
registers, or perhaps registers with MUXes in front of them depending
on the library, is replaced by a bank of smaller standard registers,
and a single clock gating cell.
By gating the clocks to enabled registers, power is saved by only
clocking the register when new data will be captured. If these
registers are capturing the result of some arithmetic functions, power
is still consumed by the data running through this logic. This
problem is addressed by Power Compiler with operand isolation. Think
of this as "data gating", where the inputs to some arithmetic function
are blocked until the capture registers are ready to capture. Power
Compiler also performs gate level optimization to improve power by
assigning high toggle rate nodes to lower capacitance pins of cells.
Toggle rates are determined with a SAIF input file from simulation.
To complement Power Compiler, a new analysis tool called Prime Power
is now available. To use an olde high school college SAT analogy,
"Prime Power is to Powermill" as "PrimeTime is to PathMill." It's
a full chip gate level power analysis tool.
- Bob Wiegand of NxtWave Communications
"Session 1 -- Library Methodologies
----------------------------------
This was a series of three papers on Synopsys library development
issues. The second one was given by an HP engineer in their PA-RISC
group. They do 0.13u SOI, greater than 1 GHz processor
design with custom libraries. The speaker went through their library
characterization process using PathMill. They only use synthesis
on "less than 50%" of their design. The methodology described was
one which enabled their library to be characterized two orders of
magnitude faster than by using SPICE, and the accuracy was within
+/- 4% of SPICE. Also, since PathMill is their signoff timing
analysis tool, the methodology provides an inherently closed loop.
The methodology indicated to me that synthesis has been in use
at HP for PA-RISC design for some time; this paper described a
process which solved their biggest problem with it, which was
responding to device model changes from their fab in a timely fashon."
- Rich Conlin of Paradigm Works
"Session4: libraries!
A guy from HP presented a method of using PathMill, a transistor
level timing analyzer, to characterize his synthesis libraries.
He covers selection of index spacing, and a QA process to compare
his results to SPICE. His results, +/-4% of SPICE, with 2 orders
of magnitude throughput improvement. (I think he said 200x speed
up in characterization process).
Some other library ideas I got, not necessarily from this guy:
1. Have the first index point be 0. That way really small numbers
aren't extrapolated, they're interpolated. The guy found a
large percentage error when it wasn't.
2. Somehow constrain the tools to not use the cells past the last
index point, once again avoiding extrapolation. I think this
means a careful combination of index selection for library
characterization to define the max point, and then some kind of
constraint in Synopsys to limit those cell's usage to be within
the matrix.
3. Someone (I think Synopsys) said that for slew the 20%-80%
measurement points were better than the 10%-90% because it
was more linear, the extra 20% included too much non-linear
curve.
A Toshiba America guy presented using a low Vth (threshold voltage)
library as a way to make timing. The method was to compile with
normal library, fail timing, then incrementally compile pointing
to a low Vth library that includes power info with Power Compiler
to replace just the critical path cells with these fast low Vth.
His low Vth cells required an extra mask step so increased chip cost.
What do we do with our x2 cells to make them the same footprint
but higher power? It's not Vth is it?
I asked him about using the area number to differentiate between
the two like we do. I don't think he understood, and thought I
was stupid. "No, they are the same area; Yes I know, but I don't
want to buy Power Compiler; But that is the only difference, the
power..."
I skipped the presentation on "Scalable Polynomial Delay Model",
a new method for specifying libaries, other than our non-linear
delay model matrices. I knew I wouldn't understand any of it."
- Paul Gerlach of Tektronix
On the business side, it appears that last year's pain and grief of merging
the EPIC group with the PrimeTime group has subsided. Customers are now
interested in the new ideas that this funky hybrid PrimeTime/EPIC group is
generating (such as PrimeTime now having crosstalk analysis capabilities.)
|
|