Apache PowerArtist vs. post-Talus layout PrimeTime-PX correlation

( ESNUG 545 Item 2 ) -------------------------------------------- [11/20/14]

Subject: Apache PowerArtist vs. post-Talus layout PrimeTime-PX correlation

> My group has some chips planned where my veteran PNR guys say would be
> trivial to layout -- only if we had some Magma Talus licenses.  They
> say that 95% of our blocks would fly through Talus; with only the
> remaining 5% would need IC Compiler expertise.
>
> Our issue is Synopsys Sales won't sell us any Talus licenses whatsoever.
>
> They gave some BS about the Talus licensing SW is broken.  They claim
> it can now cut keys only for existing users.  They want to sell us
> IC Compiler licenses instead.
>
>     - from http://www.deepchip.com/items/0534-02.html


From: [ Horse With No Name ]

Hi, John,

This email must remain anonymous.

As one of the few still active SNPS Magma Talus users, I was pleased to read
that I'm not alone.

We use a standard mixed flow: PowerArtist, Design Compiler, QuestaSim and VCS
for gate-level simulation, PrimeTime, PT-PX, and Magma Talus PNR.


APACHE POWERARTIST vs. CALYPTO POWERPRO vs. SYNOPSYS PRIMETIME-PX

Our main motivation for using a RTL power analysis/reduction tool is to get
a quick and accurate estimate of power early at RTL-level and then refine
the Verilog RTL for lower power as the RTL is being developed.

Waiting for gate-level netlists and running PrimeTime-PX (PT-PX) is too late
for us.  We wanted to start power analysis right from the start at the RTL-
level.  Due to the complexity of our designs, running synthesis, gate-level
simulations and PT-PX power analysis flow can take a day or two to run;
which is too long when a fast RTL design turnaround time is required.

Our RTL power reduction tool had to be significantly faster vs. PT-PX-based
flow -- and had to be reasonably accurate -- within 10-20% with respect to
PT-PX both in terms of absolute and relative numbers.  It had to have large
capacity to be able to handle our largest blocks, and it had to be easy to
setup and use so it could be used by all the engineers developing RTL.


WHAT CALYPTO COULDN'T DO

Before Apache PowerArtist, we were using Calypto PowerPro for our RTL power
work.  Calypto did power optimization, but it did not do power estimation at
that time -- only gave guidance for clock-gating efficiency improvements.
The potential power savings from that guidance was shown as a percentage of
power, not potential real power savings in terms of mW.

We wanted to see the actual power estimates and potential real power savings
in mW -- so we could calculate if the total power consumed by our block was
within our power budget.  Furthermore we wanted it to guide us on how to
reduce power and rank the potential power savings of each change, so we
could manually select the optimum set of changes.

We could not do this easily with Calypto PowerPro, but we could with Apache
PowerArtist, so we switched over to PowerArtist instead.


OUR METHODOLOGY

Our design flow for power analysis and power reduction.  As you can see, RTL
power analysis is an integral part. 

       
                RTL Power Analysis and Power Reduction Flow

We start power analysis while our Verilog is being developed and verified.
Even though PowerArtist can do vector-less power estimation, we don't use
it in vector-less mode.  We feel the estimates are more accurate if our
own activity vectors are supplied.

Since our requirement is to get high correlation of RTL power estimate with
our post-synthesis gate level netlist, we use activity vectors captured in
FSDB format from out RTL- and gate- level VCS simulations.

Doing RTL-level power analysis enables us to:

  - Find all sources of wasted power early and at the RTL level
  - Sort and prioritize our fixes based on actual power estimates
    from our many different simulation scenarios.
  - Make RTL changes early in the design -- where the biggest
    impact on power reduction can be made.  Gates is too late.
  - Correlate estimated RTL and actual final power savings.

With our post-synthesis gate-level netlist and prototype P&R netlist, we
are able to:

  - Correlate PowerArtist RTL power estimates with sign-off
    PT-PX gate-level estimates.
  - Understand both absolute and relative accuracy.
  - Correlate power consumption of individual cell categories
    and improve their accuracy.

The other aspect is this also gives up the data to compare overall runtimes
of our flows and different flow phases.


POWERARTIST vs. PRIMETIME-PX RUNTIMES

As I said earlier, one key requirement for us to use any RTL power analysis
tool was that it had to be significantly faster vs. a PrimeTime-PX flow.

    
                PowerArtist vs. PT-PX Flow Runtimes
                      (click pic to enlarge)

For this benchmark, we had 3 test cases.  We used multiple licenses of VCS
and PrimeTime-PX but only one license of PowerArtist.  When we used multiple
VCS and PT-PX licenses, the PowerArtist flow was 3-10X faster than the post-
synthesis PT-PX flow.

However, when you look at apples-to-apples comparison of running only one
test with one license of all tools, PowerArtist flow was 31X faster than the
PT-PX-based flow.

The fast PowerArtist turnaround time allows us to do several quick "what-if"
analysis for different scenarios and different test cases -- and to select
the most power optimized RTL, based on the early PowerArtist guidance.


POWERARTIST vs. Post-Talus P&R PRIMETIME-PX Correlation

As I said in the beginning, we still use Magma Talus for P&R.  Here's the
data for using Apache PowerArtist to do power on early Verilog RTL against
our final post P&R and PrimeTime-PX numbers:

    
                PowerArtist vs. Post-P&R PTPX Results
                      (click pic to enlarge)

For the 7 test cases, dynamic power correlation was within 15% and leakage
power correlation was within 5%.  Average correlation was ~6% for dynamic
and ~4% for leakage.  For designs without a lot of Synopsys DesignWare
components, we found the average power correlation to be around 5%.

This data is why we use PowerArtist.  With it was have a flow that's ~30X
faster and is within ~10% of post-layout GDSII.


CUSTOMIZED REPORTS

We primarily use PowerArtist in a batch mode and rely on its good detailed
reports.  The TCL API into the OpenAccess Database (OADB) is very powerful.
It allows every user to get the exact information they want and in the
format they want.  In our customized PowerArtist reports:

  - Internal/leakage power consumption for each leaf level
  - Clock power consumption / frequency
  - Clock gating coverage and clock gating efficiency
  - Cell category power consumption breakdowns for each hierarchy level
  - Memory / flops / latches / combinational / clock power breakdown
  - Dynamic and leakage power associated with a flop-type
  - Properties attached to instances, pins and nets stored in database
  - Leakage / dynamic power, clock power, RTL information
  - Duty cycle, average activity, frequency, transition time, capacitance

One big advantage of PowerArtist reports is that (unlike in the PT-PX flows)
PowerArtist preserves hierarchical information and we can easily trace the
leaf cell or signal to the original RTL.  In the PT-PX-based flows, all the
hierarchies are typically "ungrouped" and it is more difficult to trace an
instance name in the PT-PX report to the original RTL.


WHERE APACHE POWERARTIST NEEDS IMPROVEMENT

Overall we are very happy with PowerArtist, but as with all EDA tools there
is always stuff that needs fixing and it is no different with PowerArtist.

  - Manipulating OADB with Tcl is relatively slower compared to PT-PX.

  - Detailed power reduction info is not accessible through the TCL API
    for OADB.  Apart from a simple power reductions report, for detailed
    reduction data we have to load the database into the GUI.

    They need an extension of the TCL API for OADB that provides the
    same reduction data that is available in the GUI.

  - The PowerArtist pseudo code guidance for power reduction with respect
    to enable expression needs to be improved.  Sometimes its pseudo code
    is very hard to understand.

  - We discovered that you can load only one power database (i.e. result
    with one test at a time) in PowerArtist.  Not very useful when we are
    looking for common reduction opportunities across multiple tests.
    There are some workaround utilities available, but we believe this
    can be further improved inside PowerArtist itself.

  - PowerArtist's power reduction algorithm does not handle large busses
    efficiently.  When only few bits of a large bus are toggling, the
    tool does not report such cases under CEC (Clock Enable Condition).
    PowerArtist should provide better guidance by identifying which bus
    bits are toggling so we could make an alternate design decision like
    maybe splitting the bus.  PowerArtist does report such part selects
    w.r.t. other technique like SODC, though.

  - Enhanced clock-gating is very slow in PowerArtist.  It took too long
    to identify enhanced clock gating conditions on some designs.  In
    some cases we experienced significant slowdown due to enhanced clock
    gating which should not happen.  Eventually Apache got the runtime
    to within 10-15% of the runtime without enhanced clock gating.

  - Similarly, PowerArtist tests over long windows (say a few msec) can
    also take a long time.  This needs to be improved.

  - PowerArtist has an underestimation problem with designs that contain
    a large number of DesignWare components.  Dynamic power can be off by
    down to 15%.  This is because PowerArtist supports only a subset of
    DesignWare components, and those not supported were blackboxed and
    PowerArtist could not estimate their power.  Apache should support
    all the DesignWare components.

But this all being said, we still like PowerArtist because it lets us do
early RTL-level power optimation 30X faster than a PrimeTime-PX/VCS flow
that within 10% of final post-layout GDSII.  It allows us to do rapid
"what-if" analysis at RTL and evaluate various FSDB files and micro-
architectural scenarios. 

At the early RTL level, PowerArtist finds wasted power and potential power
savings at by pointing to exact source file and line number.  No need to
search through many files and hierarchies to find where a certain flop was
instantiated or inferred.  Your RTL hierarchy is fully preserved and we
don't have to worry about "ungrouping". 

Also, our engineers fundamentally believe that the really significant power
tradeoffs must happen at the early RTL and architectural level -- where
PowerArtist works -- instead of at gate-level where PrimeTime-PX works.

Gate-level is simply too late to be doing the big power-cutting changes.

    - [ Horse With No Name ]

P.S. It's good to see some of us Magma users are still active, too!

        ----    ----    ----    ----    ----    ----    ----

Related Articles

    User seeks clever (yet legal) way to get Magma Talus licenses
    User study of Apache PowerArtist RTL power reduction techniques
    We switched from Atrenta Spyglass Power to Apache PowerArtist

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2025 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)