( ESNUG 505 Item 8 ) -------------------------------------------- [05/24/12]

Subject: Two users on Apache PowerArtist because PrimeTime-PX is not enough

> We used PrimeTime PX as our reference power estimation in "average" mode.
> We only had a few vectors which were apples-to-apples RTL vs gate
> full-dump and useful for doing correlation studies.
>
> When we did correlation studies, both tools were within about 20% of
> the post-layout gate-level netlist estimate.  I'm confident with further
> block-specific tuning with Apache we could get the error bar under 10%;
> I'm not sure about Spyglass.
>
>     - from http://www.deepchip.com/items/0502-02.html


From: [ Hello Kitty ]

Hi, John,

We have been using Apache (Ansys) PowerArtist for over 4 years, primarily
for RTL level power estimation.  The advantage we get from doing early RTL
power estimation is huge.

It makes no sense for us to wait to do power estimation and reduction until
the gate level - if we need to make a change there, it is already too late.

PRIMETIME-PX IS TOO LATE:

PrimeTime-PX cannot be started until the gate-level netlist, gate-level VCD
and SPEF files are available, which is weeks-to-months after the RTL design
phase.  We then need to simulate the Verilog RTL in VCS again, resynthesize
the netlist in DC, and then check to see if the changes caused any timing
problems.

For a ~2.3 M flops IP Core design block:

         Apache PowerArtist:    8 hours runtime/elapsed time.

               DC+FE+PT-PX+
           SPEF extraction+         
               ModelSim/VCS:    Weeks-to-months for
                                data prep/runtime/elapsed time.
                              
POWER ESTIMATION ACCURACY

We need to make power-related RTL design decisions early and reliably.
PowerArtist works for this.  It typically correlates within 20% of our
Synopsys PrimeTime-PX gate level power analysis.

Signal Activity

RTL power estimation accuracy depends largely on input quality, such as the
signal activities at the RTL level vs gate level.  (RTL simulation vectors
with signal activities are readily available for early power analysis, while
the gate-level vectors are usually not available until much later.)  So we
rely on a RTL-to-gate signal mapping flow to apply the RTL sim vectors to
the gate level -- basically mapping RTL activities for output signals such
as flops and memories at the gate level.

PrimeTime-PX uses statistical activity propagation or zero-delay simulation
to cover gate-level signals without the activities annotated after mapping.
The annotation rate must be high to achieve high power estimation accuracy;
so RTL-to-gate signal mapping flow has a big role.  PowerArtist supports VCD
and FSDB inputs which are common formats used to dump simulation vectors.

PACE

Signal parasitics can cause inaccuracies between RTL and gate-level power
estimates.

Since we don't have physical layout information during RTL design, we
typically start with rough signal loading estimates based on wire load
models generated from a previous design.  When our design uses a different
process node, we try to scale the wire load models, which increases the
inaccuracy.

Apache has a useful fix here, PACE, a statistical signal capacitance
estimation model derived from post-P&R gate level netlist and associated
signal parasitics (SPEF) for a design.  PowerArtist applies the PACE models
on the RTL or on the post-synthesis (pre-P&R) gate-level netlist.

PACE model is portable to designs using the same process node sharing
similar characteristics.  Below are the steps we took to create PACE models
for our 28 nm design.

     1. Use Synopsys Design Compiler to get our netlist,
     2. Run the netlist through Cadence First Encounter
     3. Extract the SPEF from the layout using Star-RCXT
     4. Bring both the SPEF and the post layout netlist
        as input to PowerArtist.
     5. PowerArtist spits out a PACE model.  Done!
     6. We then use the PACE model as our signal capacitance
        model, instead of less accurate wire load models.

We ran some benchmarks for PowerArtist + PACE vs Synopsys PrimeTime-PX.
There was OK correlation between RTL PACE and gate-level PT-PX and SPEF,
within 25%.  Below is a benchmark using a 28 nm block with, 80 K flops.

                  Synopsys PT-PX + SPEF:   225 mW
      Apache PowerArtist w/ PACE models:   170 mW (-24.4%)

PACE could use these further improvements (more accuracy is always good).

   1. 25% correlation with RTL PACE and gate-level PT-PX and SPEF
      is OK -- but this should be tightened up.  10% would be better.

   2. Further accuracy/consistency improvement for capacitance models.
      We can do coarse clock tree estimation during RTL design, but
      since clock trees are not inserted until gate-level synthesis,
      the PACE accuracy still varies with clock tree power, depending
      on the design methodology.

      We are currently working with Apache on this.

   3. We want PACE to support/scale to different corners:  Worse case,
      nominal case, and best case.  We now do power estimation based
      on nominal case - one corner to correlate and track.

RTL POWER REDUCTION

PowerArtist helped us identify if a section of our design is consuming a lot
of power.  It then points us to RTL design opportunities to reduce power,
and gives us the option to do:

  1. "Automated" optimization, i.e. automatically inserts the clock gating
      logic for you.  We don't rely on the automatic optimization, because
      we don't know if it is doing it wrong and changing the logic, or if
      the RTL change could cause timing failure.

  2. "Guided" optimization.  We use this guided/interactive power debug
      approach and make the design changes manually.  We then use
      Conformal to formally verify any RTL changes we make.

Below are a few of the PowerArtist options we use.

  - Clock gating.  It identifies the registers with inactive input
    data, and suggests enable signal to clock-gate such registers.

  - Memory clock gating inside memory elements.

  - Clock gating efficiency reports. PowerArtist reports where it analyzes
    the design as a whole for positive efficiency.  (This is necessary
    because if you are not careful, clock gating doesn't always give you a
    good result and can consume more power.)

  - A TCL shell to command to the Open Access database to query and create
    custom reports.  Apache supports the usual formats we use for mixed
    HDL design (Verilog, System Verilog, VHDL), simulation (VCD, FSDB),
    library (Liberty), and power constraints (UPF, CPF).

  - A GUI that highlights a section of our design with opportunities for
    power reduction and lets us cross-trace between RTL and schematics.
    Basically it lets us do:

       1) "automated" optimization - where it automatically inserts
          the clock-gating logic for you and shows you the new RTL,

    or 2) a "guided" optimization where we do it ourselves.

    Depending on the design, our typical PowerArtist "guided" power
    reduction is ~10%.

Apache also has a tight integration between PowerArtist and RedHawk.  We
can simulate the entire design and use PowerArtist to identify the right
duration from the long duration vectors for dynamic voltage drop analysis
with RedHawk.  We don't feed RedHawk the entire simulation vector, just
the portions with highest activities to study dynamic voltage drop/power
profile of the power rail.

    - [ Hello Kitty ]

         ----    ----    ----    ----    ----    ----    ----

From: [ The Invisible Man ]

Hi, John,

We bought Apache PowerArtist about a year ago.  We use it for RTL power
analysis, "automated" optimization, and "guided" optimization.

PowerArtist's "guided" optimization presents the analysis data and power
reduction choices via their GUI.  For example, clock gating, memory gating,
and register enables.  Our process is to:

   1. Run PowerArtist's power reduction
   2. Choose the most power saving item it presents
   3. Fix our RTL
   4. Re-run PowerArtist power reduction to check the result

Typical power reduction with "automated" optimization is from 3% to 6%.
Our average power reduction (pre- and post- PowerArtist) is about 5%.
We also use SNPS Power Compiler in our flow.

Here is representative runtime benchmark data.

                       Block size:     600 K gates
       Apache PowerArtist runtime:     20 minutes
     Synopsys DC + Synopsys PT-PX:     2 hours

It takes us about 30 min to set up PowerArtist for new designs or blocks.

As for accuracy, PowerArtist is within ~10% of gate-level PrimeTime-PX.
This is sufficient for us to make the right RTL design power decisions.

PowerArtist's biggest strength is that it helps us find power bugs and give
quick power estimations.

GOTCHAS:

  - Equilance checking doesn't work so well pre- and post- RTL when
    PowerArtist tweaks the RTL in "automatic" mode.  We have both
    Conformal and Formality formal tools, but neither can take care
    of pre- and post- PA run netlists if the sequential logic changes
    due to power optimization in PowerArtist.

  - Currently we've heard that only Calypto SLEC (Sequential-LEC) can
    handle this but we haven't looked into details.

It might normally take us 2 weeks to meet our power goals for a particular
project by just doing the reductions at the gate-level.  With Apache, we
cut that time to only about 4 days to do so.

So using PowerArtist for RTL power reduction saves us about a 60% of the
time we would normally spend on doing power reductions by hand.

    - [ The Invisible Man ]
Join    Index






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)