( ESNUG 524 Item 6 ) -------------------------------------------- [05/16/13]
From: [ George Economakos of NTU ]
Subject: One reader's trip report on Calypto RTL power reduction webinar
Hi John,
I know you like trip reports, so thought I'd share with you what I saw last
month in Calypto's webinar:
"Minimizing RTL Power Through Sequential Analysis"
on reducing RTL power through sequential clock gating. It was ~50 minutes.
First Calypto presented traditional motivations for low power, i.e. mobile
devices, battery life, shrinking geometries, etc.; but they missed out one
very important one -- data centers -- digital warehouses use 30 billion
watts, roughly equivalent to the output of 30 nuclear power plants. A
single data center can take more power than a medium-size town.
Anyway, Calypto sells a tool called PowerPro which analyzes and optimizes
power at Verilog/VHDL RTL -- as opposed to tools from Cadence and Synopsys
which only do so at gate-level -- so it was no surprise that the webinar
was about the virtues of reducing RTL power.
WARNING: The webinar didn't actually show any screen shots of PowerPro
nor did it talk much about PowerPro specifically. I would like
to have seen a bit more on the product.
The webinar showed how there is a lot more opportunity to reduce power at
RTL compared to gate-level. One slide showed that ability to influence
power was up to 80% at RTL compared to 10% at gate. However, Calypto
failed to provide any source or background on how they got these numbers.
While these numbers may be questioned, I do tend to agree it's better to
start at RTL when it comes to trying to reduce power.
Calypto then very quickly went through some common power reducing techniques
such as combinational clock gating, which is the most popular technique
because it is fairly easy to do; data gating, operation reduction, etc.
They showed operator reduction as an interesting technique where an equation
is changed to reduce the number of operators. One of their slides showed
that to reduce the number of operations and thereby to reduce power, the
expression:
X2 + A*X + B can be replaced by X * (X+A) + B
This sounds great, but it would have been nice if Calypto would have spent
little more time showing how to detect these expressions and how to
transform them. I guess they wanted to get to the main topic which was
sequential analysis and sequential clock gating. I would like to see
operator reduction and some other commonly used RTL power optimization
techniques covered in future webinars.
I like the way the importance of sequential analysis was explained. One
slide defined power by the equation:
Power = Energy / Time
So to reduce the power, either you can reduce the energy consumed to perform
that operation or you can stretch the operation over a larger period of
time. And to look at operations over a large time period, you need to look
at several clock cycles, and this is what sequential analysis is about.
SEQUENTIAL CLOCK GATING
Basically, as I understand it, sequential analysis involves analyzing the
design over multiple clock cycles and looking at which signals change,
actually propagating to the final output and which flops maintain their
value over time. These two conditions were defined as observability-based
clock gating and stability-based clock gating respectively.
- Observability-based clock gating is when a change in a signal value
does not propagate to a primary output or a flop/latch/memory and
does not affect the primary outputs.
- Stability-based clock gating is when the same value as in the
previous clock cycle is getting latched into a flop/latch such
that it will not have any effect on the primary output.
Calypto then illustrated observability-based clock gating with a simple
circuit that made the power of sequential clock gating stand out over
combinational clock gating. The circuit was a three staged pipeline data
path containing 5 flip-flops. Under the combinational clock gating
condition, only the last flip flop was gated.
Whereas under the sequential scheme, all the flops ended up gated and the
entire pipeline was effectively shut off when the intermediate activity was
either redundant or had no effect on the output. (Flip-flops D-1 and D-2
are clock gated.)
The webinar also highlighted the difference between simple sequential
analysis and deep sequential analysis. Simple sequential analysis can be
done with structural analysis. But structural analysis does not work when
the circuit gets complex. They showed a circuit where the flop's
output was being fed to different computations such as multiplication,
addition, or comparison and ultimately several stages down in the pipeline
one of the computations got selected. A simple structural pattern
dependent tool will not be able to find that the first flop in the pipeline
can be gated based on observability-based condition.
Similarly if you have multiple valids at the beginning of the pipeline and
they combine through some combinational logic and ultimately are fed to the
downstream flops, a very simple enable forwarding technique will not be able
to generate this gating condition on the downstream flops.
CLAIMED TRUE DEEP SEQUENTIAL ANALYSIS
On the other hand, Calypto claims to have true deep sequential analysis,
proper mathematical and formal reasoning of the design -- where you need to
find out that the writes to a flop are not going to make it to the final
output -- or when you are going to write the same data again and again to
a register. Calypto said analyses of these conditions require formal
mathematical reasoning about the behavior of the design.
Even though all the analyses can be performed vector-less, the webinar
claimed that if the user provides activity vectors through VCD, SAIF or
FSDB, their analyses can be further refined and more optimized results can
be achieved.
Overall, the design flow Calypto proposes is based on what it calls an
Interactive Sequential Clock Gating tool, PowerPro. The main difference
between Calypto's PowerPro and automatic low-power RTL synthesis tools is
that it covers more potential power savings opportunities (after sequential
analysis) and proposes to the user RTL coding or constraint modifications
that can make these opportunities effective. So, the flow goes from RTL to
PowerPro annotated RTL and then (after user approval) to low-power RTL
synthesis for gate level implementation. Some screenshots from the tool
could have better clarify the flow but they didn't have them.
TEXAS INSTRUMENTS
Finally they showed a slide which had the results from a joint work with
Texas Instruments where TI was able to get 52% power savings on some designs
using a Calypto's PowerPro tools.
The webinar helped me understand why sequential clock gating is does power
savings. I could be wrong, but I get the feeling that not many companies
have adopted deep sequential analysis tools yet.
One thing the speaker did say that, deep sequential analysis is not easy to
do without help from a good deep sequential analysis power tool.
LACKED MEMORIES
Even though the webinar had some technical depth, it only showed data path
examples and no memories at all. I know sequential analysis is equally
applicable to memories especially in reducing leakage power when there is
a sleep option available. Including memories in the webinar would have made
it more complete in my opinion. I looked on their web and it looks like
they have since added a webinar on memory clock gating.
Also power analysis, a key aspect of RTL level power exploration, was not
covered as part of the webinar. It would be interesting to know what issues
sequential clock-gating, etc. pose for power analysis.
The presenter, Abhishek Ranjan, was obviously an expert in low power and was
very articulate, but at times he went too fast. I hope Calypto will archive
this webinar and make it available for future reference.
- George Economakos
National Technical University Athens, Greece
Join
Index
Next->Item
|
|