( ESNUG 465 Item 10 ) ------------------------------------------- [06/28/07]
Subject: ( ESNUG 464 #3 ) How TenSilica uses Sequence's PowerTheater tool
> PowerTheater gave us what we needed to identify the architecture changes
> to reduce our power. It did this by making it easy to identify where the
> power was being consumed in the design at the RTL-level and where to make
> the necessary course corrections.
>
> Vega 2 met our goal of 2x processors for less power.
>
> - Jack Choquette
> Azul Systems Mt View, CA
From: Eliot Gerstner <gerstner=user domain=tensilica bot calm>
Hi, John,
We have been using Sequence's PowerTheater tool suite at Tensilica since
version 2000.4.1 (i.e., when it was still Sente's WattWatcher) and we
currently use the latest 2006.3 release. There was rapid development in
those earlier years, with new dot releases appearing every 6 weeks or so,
which usually required a re-write of our tool scripts. However, beginning
with the 2004 release series, the tool reached maturity and has since
followed a stable release schedule.
We do our design trade-offs on each module by running PowerTheater on a
fixed set of diagnostic tests *before* we even have a gate-level sim
functioning. It highlights which register banks are suited for
additional clock gating, as well as suggesting locations for further data
gating. By doing this, we can reach our power budget while we are still
fleshing out our TIE (Tensilica Instruction Extension) language, which we
then compile into synthesizeable RTL. We have found that when we later do
gate-level power simulations, we do not have major surprises.
For our experiments, we were using a 5-stage processor with 1 KB of both
data and instruction caches, 2 KB of data RAM, and 16 KB of instruction
RAM. It contains a single load-store unit, full scan support, including
timers and interrupts, and a 128-bit processor interface. This processor
synthesizes to a high-speed gate count of 78,000 gates.
Our power numbers are for a TSMC 130 nm at typical operating conditions,
and at a sim speed of 100 MHz. They contain both dynamic and static
power, and are for the processor only, i.e., it excludes local memories.
We ran 3 diagnostic tests to determine power consumption:
1) Back-to-back NOP instructions to represent a lightly loaded state,
2) Back-to-back load and store instructions to represent a maximum
utilization state, and
3) power-down and wait for an interrupt to represent the sleep state.
We took a straight average of these numbers to represent the overall power.
Give all of the above, the power at the beginning of our experiment was
13.07 mW, and at the end was 7.42 mW, which was a reduction of 43%!
For power management, we have inserted into our RTL two levels of clock
gating -- one for "global" clock gating that is used to power down entire
functional units during sleep states such as WAITI (our opcode for shut
down and "wait for interrupt") -- and "functional" clock, which is very
fine-grained clock gating for each of our individual modules. We do not
rely on any EDA tools to perform clock-gating for us. We use a single
voltage domain, and although our processor and CAD flow offerings allow
for multi-vt optimization, these power numbers quoted above are for a
single (low-vt) library.
We use diagnostic tests to perform our power analysis, i.e., we use
simulation-based toggle vectors. We simulate only in the functional mode
of operation, i.e., we tie any scan enable pins off.
Basically we have 2 use models for the PowerTheater tool:
1) A regression suite that runs every weekend which primarily serves
the error-checking procedure; and
2) For tuning our RTL.
We tend to look at power consumption when the functionality of the
processor is already pretty firm, and we're looking for additional
register banks to clock gate. We do all clock gating manually in the
RTL. We use PowerTheater to identify registers and their bit widths,
and if we feel we can gate them, we do. We also run some simulations
for architectural tradeoffs as well, but this is rare.
Our weekly regression schedule has a stable matrix of processor
configs and diagnostic tests are run through PowerTheater at the RTL
level, and the results are plotted over time. This allows us to
correlate any change in power with that week's code changes, allowing
us to perform proper cost/benefit analysis quickly. This also serves
as a valuable error-checking tool, since any power surges from a given
processor module can be tracked down quickly.
PowerTheater's RTL power reports correlate well to our gate-level
power values, as analyzed by using Synopsys Power Compiler on a
post-route netlist (we support both Cadence SoC Encounter and Synopsys
Astro for routing), using back-annotated capacitance values and a SAIF
toggle file; this is highly dependent on our selecting the correct
wire-load model to feed into PowerTheater.
For example, if we use PowerTheaters "top" wire-load selection mode,
there will be a tipping point where feeding PowerTheater a slightly
more pessimistic wire-load model will cause the tool to go from
selecting all low-drive cells to all high-drive cells from the
library, causing a large jump in estimated power.
Using "enclosed" or "segmented" wire-load selection modes might lessen
this effect, but the current trade-off is that it limits the tool's
accuracy to +/- 15% of gate-level power. We don't use PowerTheater for
gate-level power analysis, although some initial experimentation
showed that PowerTheater gate-level analysis had identical results to
Power Compiler gate-level analysis. The main thrust of the 15%
differential was that RTL PowerTheater numbers could be made to under-
or over-shoot gate-level numbers simply by adjusting the wire-load
model past a certain tripping point that caused all cells to assume a
higher drive strength.
- Eliot Gerstner
Tensilica, Inc. Santa Clara, CA
Index
Next->Item
|
|