( ESNUG 512 Item 5 ) -------------------------------------------- [10/18/12]
Subject: A user study of Apache PowerArtist RTL power reduction techniques
> We have been using Apache (Ansys) PowerArtist for over 4 years, primarily
> for RTL level power estimation. The advantage we get from doing early RTL
> power estimation is huge.
>
> - from http://www.deepchip.com/items/0505-08.html
From: [ Yosemite Sam ]
Hi, John,
Anon please.
We adopted Apache/Ansys PowerArtist for RTL power analysis, automated
optimization, and guided optimization about 3 years ago. Prior to this we
used Sequence's Power Theatre for power analysis at both RTL and gate-level.
I would like to share some of our results with your reader.
As power has become a limiting constraint for chip design, it is important
to get an early assessment of the power consumption during the RTL design
stage and specifically avoid having power "bugs" make their way into the
gate-level implementation. At the RTL level, PowerArtist allows us to:
- assess the design's power consumption,
- identify high activity periods of the design,
- quickly identify where power is possibly wasted and drive
continuous improvements of our RTL code.
When we give our RTL designers a tool in their hands to help them visualize
the power consumption it makes them become more power conscious.
PowerArtist lets them quickly check complex design blocks and look for power
issues across block hierarchies. As an example, for a 2 M instance design:
PowerArtist runtime at RTL ~2 hours
PowerArtist runtime at gate-level ~10 hours
PowerArtist offers both automated and guided power optimization. We use its
automated RTL changes to quickly generate modified code for further testing
or in cases where the change is readily understood by our RTL designer.
POWERARTIST AUTOMATIC OPTIMIZATION
Here's where PowerArtist automatically optimizes power:
1. Sequential clock gating (PRISM). Identifies chains of registers
where a register early in the chain is enabled, and registers later
in the chain are not enabled. The existing enable is used to gate
later registers.
2. Register enables (LNR). We can use PowerArtist to automatically
gate additional registers by creating enables, similar to Synopsys
Power Compiler. However, we run Apache PowerArtist earlier in the
design. Furthermore, PowerArtist estimates savings gained if a
clock enable is generated by detecting changes on the bus.
3. PowerArtist also identifies existing clock enables that waste power
(LEC). Rather than changing the RTL, PowerArtist generates Power
Compiler constraints for ICG implementation.
4. Memory clock gating (GMC). PowerArtist automatically disables
memories that are inactive.
POWERARTIST GUIDED OPTIMIZATION
Apache also has some guided optimization techniques, which are very design
dependent. Here's were we've see it tweak our designs:
1. Observability Don't Care (ODC). Checks whether a register output is
not observed downstream based on logic conditions.
2. Clock enable condition check (CEC). PowerArtist looks for clock
enable efficiency and identifies inefficient enable conditions and
provides guidance on aligning the clock enable condition with the
logic activity.
3. Split Memory. Identifies wide memories and analyzes the dynamic
power improvement from splitting the memory. However, the tool does
not take leakage into account. The designer in this case needs to
do some additional analysis to understand the total power savings.
4. Eliminate wasted activity of unselected MUX inputs. You can change
the clock enable conditions, so that you do not waste inputs because
you did not select them. You can do this by making a condition of
the select signal.
We analyzed a number of designs and checked which of the power techniques
were most effective. This is illustrated in the chart below.
Figure 1: Normalized effectiveness of power reduction techniques
used over several designs
We then overlaid the average relative power reduction achieved with these
techniques and obtained the weighted results below.
GMC showed up in a few designs as shown in Figure 1, however, in our case
the average power benefit was small compared to other techniques. This is
illustrated in Figure 2, where the average power benefit from using GMC is
close to 0%. This, however, does not mean that GMC will never be useful
in reducing power. It is more an indication that in our suite of analyzed
designs memory clock gating was already well addressed in the design.
Figure 2: Normalized average power benefit achieved over several
designs
What the two graphs also indicate is that depending on the actual design
one or more techniques can help reduce power.
Some of the techniques PowerArtist uses require vectors, while for others
both vector or vector-less analysis is possible. In general, we find the
most accurate analysis uses vectors. Finally, for *some* PowerArtist
changes, we find that we only need to run formal verification (using e.g.
Formality or Conformal); for *others*, we must rerun simulations again.
POWERARTIST GUI AND REPORTS
Apache's GUI is fairly powerful and user-friendly. Our designers use the
tool because they quickly understand where power is wasted. It does:
1. Cross probing the RTL design, the waveform, and associated power
numbers.
2. We can use the GUI to automatically find and highlight problem
areas such as rating the top 10 power hogs -- the ones they
list as power hogs often surprise us. PowerArtist then does a
pareto-type analysis, picking out the top 50 or so suggestions
out of 3000 changes that give us biggest bang for buck.
3. You can customize the above reports through TCL.
Here's a snapshot of the PowerArtist GUI. I got it from Apache, as any
actual snapshot of ours would contain my design data.
POWERARTIST ANALYSIS ACCURACY
The accuracy of power analysis between RTL and gate-level depends in part
on the assumptions made with regard to gate loading. Apache does PACE
(PowerArtist Calibrator and Estimator) that infers the actual loading of
an equivalent/representative design at RTL level.
For PACE to work best the representative design has to be in the same
process technology node as the one being analyzed. So far, we have not
evaluated PACE.
However, we have tested PowerArtist's accuracy to silicon at the gate-
level and have found that it's with 10% accuracy that is typically
expected over multiple vectors (i.e. activities). This gives us enough
confidence to use it for power optimization.
Apache recently announced a new capability called RTL Power Model (RPM).
RPM represents the power model of a design at RTL stage and can drive
early decisions on power grid and package substrate design. Using
one of our test cases the Apache FAE's were able to get ~15% matching
of our power integrity analysis using RPM and layout.
- [ Yosemite Sam ]
Join
Index
Next->Item
|
|