( ESNUG 447 Item 2 ) -------------------------------------------- [09/26/05]

From: Dan Freitas <dfreitas=user domain=azulsystems spot calm>
Subject: One Designer's Evaluation of Apache RedHawk vs. Sequence CoolTime

Hi, John,

A few months back we evaluated the Apache RedHawk power analysis tool. 
We compared RedHawk against Sequence's CoolTime (which we used on a chip 
we built last year).  The following is a summary of the evaluation:

The chip used for the eval is a fairly large (~19mm on a side, ~4 million 
placeable instances) flip-chip with power bumps distributed across the 
entire core.  The chip is placed and routed hierarchically.  We did static
and dynamic power analysis at both the block level (using inserted 
power-points) and full-chip level (with power applied to the bumps). 


Different approaches

One of the hard parts in dynamic power analysis is toggling a representative 
set of instances to uncover all the VDrop hot-spots.  Apache and Sequence
take different approaches to the problem.  Sequence and Apache both give you
the option of using VCD files or using a "vector-less" approach.  For
vector-less analysis,  Sequence has you set the percent of instances, and
primary-inputs, to be switched and to what state (0->1 or 1->0).  They then
propagate the output of the flops/PI's through the logic cones to see which
gates will toggle.  Their "secret-sauce" is an algorithm that tries to
figure out which flops to toggle in order to maximize coverage.  

Apache gives you a giant knob called "PAR" (Peak-to-Average Ratio).  They
measure the average power for all the instances, then use the PAR to figure
out how many instances they will toggle in the dynamic analysis.  Their
"secret-sauce" is the algorithm for prioritizing which gates will be chosen
to toggle.  They keep adding gates until they reach the (average * PAR)
value.  Make PAR big enough and you'll toggle all the gates.  Clearly 
overkill.  Make it very small and you can report to your boss that you 
"see no problem with the power grid"... :-)  The problem is that it's pretty
much a guess as to what to realistically set it to.  Also, since they aren't 
strictly following logic cones, you can end up with combinations of gates 
toggling that could never do so during the actual operation of the chip.

For static analysis, both will get the job done, but Apache does what you
want by default.  With CoolTime you have to do a bit more work to get 
what you want.   RedHawk sums all the instances by default and then applies
a toggle rate.  This lets you scale the average power as aggressively as
you dare.  Since CoolTime uses a flop activity/logic propagation model you
can never get 100% of the instances toggling.  You have to use a back-door 
approach where you specify the current for all the instances in a currents 
file.  Alternatively, you can specify the total block power and CoolTime 
will divvy up the currents among all the instances in the block.  Either
way, you are essentially skipping the logic propagation part of the flow,
then  having CoolTime do the grid analysis with 100% of the instances
considered.  

The bottom line on the approaches:

Using either approach, during dynamic VDrop analysis, you still can end up 
with gates (or even macros if you aren't careful) sitting idle in a weak 
power grid area thus masking a potential problem.  In our tests, both
RedHawk and CoolTime found the gross violations (broken power grid sections,
huge EM violations in under-via'ed grid intersections, etc.).  During
dynamic simulations, however, we did occasionally see different gates
flagged by each tool.


Setup

If you already have a PrimeTime flow in place, then setting up RedHawk is
very straight forward.  RedHawk uses PrimeTime to dump out timing window
information via a script supplied by Apache.  It also has a built-in 
extraction engine that does a simpler flat-extract and requires only a 
process layer-profile techfile and you're good to go.  With CoolTime you 
will have to set up their Columbus extraction/tmkr environment and ShowTime
timing env. 

The things that make RedHawk easy to set up ultimately limit its capability.
Since RedHawk doesn't have a built in timing engine, it can't do timing
aware VDO (VoltageDropOptimization) operations such as instance spreading
to eliminate hot spots.  RedHawk has just added the ability to insert dcap
but relies on your eco-placer to resolve cases where there is not enough
available open spaces around the hot spot .  That could result in nasty
eco-place displacements of instances in critical timing paths if you let
RedHawk be too aggressive in placing dcap.  CoolTime, on the other hand,
has the ShowTime engine built in and so, in theory (not tested by us yet), 
it should be able to add dcap, and move instances, with out hosing your 
critical paths.  
 
Another benefit you get with the Sequence tool is that it uses the full-
featured Columbus extractor.  This extraction engine handles hierarchical
blocks and can be run n-way MP.  In our original chip we were generating
block level spefs using from 4 CPU's.  Columbus would then stitch the block
level spefs into a full chip spef for subsequent power analysis.  

RedHawk extracts the chip flat with a single CPU.  The saving grace is that 
since the RedHawk extractor is optimized for power-grid extraction, it runs
VERY fast.  More on that later.

The bottom line on setup:

You will be up and running with RedHawk very quickly (assuming you already
have a PT timing flow in place) but will ultimately run into limitations for 
more aggressive VDO operations.   Setting up CoolTime is more involved, but 
gives you a more sophisticated optimization environment and more flexibility 
in how the tools are run (flat/hierarchical/uniprocessor/mp).


Run Time, Memory

Comparing the run time between these was difficult as, at the time, we 
could only run CoolTime/Columbus on our Sun-Fire-880 (8 CPU, 96Gig Phy mem).  
RedHawk ran on our new Opteron for the blocks (~3x the throughput of our Sun)
but we still needed to use the Sun for our full chip runs due to the memory 
requirements.

At the block level (~120K instances), CoolTime (Sun) took ~33 minutes to run.
The component parts were: Extraction (9 minutes), STA (2 minutes), dynamic 
VDrop (22 minutes).  

Apache (Opteron) took ~12 minutes to run.  The component parts were:
Extraction (35 seconds), STA (PT on the Sun 6 minutes), reading the results 
of PT into RedHawk (1.5 minutes), 40 seconds to do the baseline power calc,
dynamic VDrop (3 minutes).

So, RedHawk's extract was blazing fast, but the overall time was slowed by
dumping out the PT timing window information.  PT now runs on the Opteron
so that portion should speed up dramatically.  On the other hand, Columbus
also now runs on the Opteron...  If we're a little loose and say that, in
general, our Opteron jobs are finishing ~3x faster than our old Sun, then
I'd expect we'd see an 11 minute CoolTime result and a 7-8 minute RedHawk
result.  Most of the difference coming from the faster RedHawk extraction.

I should note one nagging RedHawk use issue.  Apache forces you to do an 
extract for the static analysis, and another extract for the dynamic
analysis.  With CoolTime you only extract once and then use the same
grid-spefs for both runs.  I have no idea why Apache takes this approach.
At the full chip level this is expensive.

Chip level runs are universally painful.  Both CoolTime and RedHawk had to 
be run on our Sun as it had the most amount of physical memory.  CoolTime 
took about 3.5 days to complete the extraction, STA and two sets of 
static/dynamic analysis (we did state 1 and state 0 for each).  RedHawk took 
about 3.0 days.  48 hours of that time was just to dump out the PrimeTime 
timing-window information.  The Apache PT script has recently been
streamlined so this part of the flow should be measurably improved, but we
haven't benchmarked it yet.  I estimate our RedHawk job would take about
30 hours on a large memory, Opteron.  We could have reduced the overall time
for CoolTime by increasing the number of CPU's that Columbus used, but would
have been run out of the building by the rest of the team...

In the CoolTime flow you run Columbus, then call CoolTime.  The job never 
used more than about 20 GB of memory (depending, of course, on how many way 
parallel we went during the extract).   RedHawk, on the other hand, was a 
total memory hog.  Our full chip job required about 70 GB.  That was broken
up between two processes, a 55 GB RedHawk/extraction process (that idles
during the VDrop computation) and a 15 GB process for computing the VDrop
results.  Apache has recently implemented "disk-caching" in RedHawk.  This
is supposed to be more efficient than letting the OS page.  We'll be
benchmarking that soon.  

The bottom line on runtime/memory:

RedHawk is faster at the block level (where Columbus's parallel CPU
capability is not a factor).  Both take roughly the same amount of time
on the top level.

CoolTime understands and maintains hierarchy.  RedHawk flattens everything.

Plan on buying the biggest, baddest Opteron you can get if you want to run 
RedHawk on a large chip.  


Debugging/Finding Problems

Sequence formats their reports into HTML web pages that guide you through
the various results/error files.  This works fairly well, but rendering
HTML is slow with huge output files.  For the full chip runs,  we wound up
just diving into the subdirectories directly to get at the raw reports and
side files.  The CoolTime GUI was slow, rather clumsy,  and seemed to be
an after thought.  RedHawk, on the other hand seemed to be written around
the GUI.  It is nicely organized and presents a good set of view options
to help you get at the data you need to see.

In general, we don't tend to be GUI driven since most of the flow is 
done in a non-graphical batch mode.  VDrop analysis, however, usually 
requires staring at layout to get a handle on what is the root cause of the 
problem being reported.  It's one thing to get a report saying you have 
300mv drop problem on an instance.  It's quite another to know *why* you 
have the drop and what to do about it.

Debugging and getting to the root of the problem was easiest to do in
RedHawk. The GUI is fairly fast, though Apache really needs to invest some 
more time to speed up the displaying of full-chip data. With a button click
you can jump between layout, EM, power, and VDrop views.  You have a virtual
DVM and can probe any layout wires to measure absolute voltages or to see 
current densities.  RedHawk also has a "what-if" layout tool that lets you
add additional metal/vias then re-analyze within the session to see if 
proposed fixes are worthwhile.  The "what-if" tool was a bit unstable though
and it often crashed RedHawk while running it...

The bottom line on debugging:

CoolTime will tell you where the problem is.  RedHawk will tell you where 
the problem is and give you more help figuring out the root cause.

Unfortunately, Apache doesn't license a viewer option for RedHawk.  This 
will limit our use of it simply because we can't justify paying six-figures
for additional RedHawk licenses to be used as RVE (Mentor's DRC/LVS viewer)
equivalents.  


Memories

We weren't able to do more than the basic lib/lef memory modeling for the 
eval but will be trying the more advanced modeling capability in the near 
future.  I'll report on the results when we have some experience with it.  

In their most basic mode, both tools take the memory .lib current, and 
LEF pins, and distribute the currently evenly over the intersection of the 
power grid and the memory pins.  

The next level of accuracy, supported by both tools, is to extract the power 
grid of the memory gds in order to model feed-through contribution of the 
memory power grid super-imposed on your top level power grid.  Current sinks 
are still placed at the LEF pin/power grid intersections.  

At the time of the eval, CoolTime had only one other memory mode.  That was 
to recognize sections of the memory, mainly by working with the memory
vendors to understand the instance naming, and weight the .lib currents over
the different sections.  This a lot better than distributing the currents
evenly, and would probably suffice for our flip-chip bump-powered grid. 

If you really want accurate memory modeling, though, Apache wins hands down.  
Given their Nspice/circuit roots, they provide the capability to do an all 
out extraction and simulation of the memory to generate spice accurate
current waveforms and apply them to the detailed memory power grid.  

The bottom line on memories:

CoolTime will get you by but isn't as sophisticated as RedHawk when it comes
to detailed memory modeling.  RedHawk could also be used to QA memory IP
used in our design.


Stdcell modeling

Both will let you generate detailed current profiles for the stdcells.
Apache calls this "APL" and Sequence calls it "AcuWave".  Both like to make
a big deal about it and boast about "having spice accurate" stdcell power 
models.  I think it's amusing given the fact that you have this giant 
inaccuracy knob called (take your pick): "Toggle-rate", or "PAR", or 
"switch-factor", that can change your results by orders... 

The bottom line on stdcells:

Both CoolTime and RedHawk can accurately model them.


Overall Summary (IMHO)

Both CoolTime and RedHawk caught our structural grid errors.  

Both CoolTime and RedHawk identified legitimate dynamic hot spots, however 
there were some areas that showed up in one tool but not the other, and vice
versa...  

RedHawk is a nicer tool from a "go in and find what's causing the problem"
point of view. 

CoolTime has more potential for automatic fixing of dynamic VDrop issues 
(through dcap insertion and instance spreading) and for subsequent timing 
optimization in the presence of VDrop given its tight integration with 
ShowTime.

CoolTime's memory modeling is more rudimentary than RedHawk's.

RedHawk has a blazing fast extractor, but is a memory hog.  Sequence can
easily handle huge, hierarchical designs with multi-way MP capable Columbus 
but generally runs slower than RedHawk.

Which tool is better?   Depends.  They're both good, and bad, in different 
ways.  We are planning to use both in our current design.  CoolTime is well 
suited for our batch oriented block analysis and timing aware VDO operations.
RedHawk will be a good top level cross-check and debugging tool used
primarily through its GUI and will give us the ability to model/simulate
memories and complicated I/O structures.

One thing's for sure:  After seeing the DRC/LVS legal, self-inflicted, 
power-grid problems that were generated by our (and our vendors) various 
TCL/Perl/DEF hacks and tools, taping out without running any power analysis 
tool would be insane.

    - Dan Freitas
      Azul Systems                               Mountain View, CA

Index   
Next->Item







   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)