( ESNUG 447 Item 2 ) -------------------------------------------- [09/26/05]
From: Dan Freitas <dfreitas=user domain=azulsystems spot calm>
Subject: One Designer's Evaluation of Apache RedHawk vs. Sequence CoolTime
Hi, John,
A few months back we evaluated the Apache RedHawk power analysis tool.
We compared RedHawk against Sequence's CoolTime (which we used on a chip
we built last year). The following is a summary of the evaluation:
The chip used for the eval is a fairly large (~19mm on a side, ~4 million
placeable instances) flip-chip with power bumps distributed across the
entire core. The chip is placed and routed hierarchically. We did static
and dynamic power analysis at both the block level (using inserted
power-points) and full-chip level (with power applied to the bumps).
Different approaches
One of the hard parts in dynamic power analysis is toggling a representative
set of instances to uncover all the VDrop hot-spots. Apache and Sequence
take different approaches to the problem. Sequence and Apache both give you
the option of using VCD files or using a "vector-less" approach. For
vector-less analysis, Sequence has you set the percent of instances, and
primary-inputs, to be switched and to what state (0->1 or 1->0). They then
propagate the output of the flops/PI's through the logic cones to see which
gates will toggle. Their "secret-sauce" is an algorithm that tries to
figure out which flops to toggle in order to maximize coverage.
Apache gives you a giant knob called "PAR" (Peak-to-Average Ratio). They
measure the average power for all the instances, then use the PAR to figure
out how many instances they will toggle in the dynamic analysis. Their
"secret-sauce" is the algorithm for prioritizing which gates will be chosen
to toggle. They keep adding gates until they reach the (average * PAR)
value. Make PAR big enough and you'll toggle all the gates. Clearly
overkill. Make it very small and you can report to your boss that you
"see no problem with the power grid"... :-) The problem is that it's pretty
much a guess as to what to realistically set it to. Also, since they aren't
strictly following logic cones, you can end up with combinations of gates
toggling that could never do so during the actual operation of the chip.
For static analysis, both will get the job done, but Apache does what you
want by default. With CoolTime you have to do a bit more work to get
what you want. RedHawk sums all the instances by default and then applies
a toggle rate. This lets you scale the average power as aggressively as
you dare. Since CoolTime uses a flop activity/logic propagation model you
can never get 100% of the instances toggling. You have to use a back-door
approach where you specify the current for all the instances in a currents
file. Alternatively, you can specify the total block power and CoolTime
will divvy up the currents among all the instances in the block. Either
way, you are essentially skipping the logic propagation part of the flow,
then having CoolTime do the grid analysis with 100% of the instances
considered.
The bottom line on the approaches:
Using either approach, during dynamic VDrop analysis, you still can end up
with gates (or even macros if you aren't careful) sitting idle in a weak
power grid area thus masking a potential problem. In our tests, both
RedHawk and CoolTime found the gross violations (broken power grid sections,
huge EM violations in under-via'ed grid intersections, etc.). During
dynamic simulations, however, we did occasionally see different gates
flagged by each tool.
Setup
If you already have a PrimeTime flow in place, then setting up RedHawk is
very straight forward. RedHawk uses PrimeTime to dump out timing window
information via a script supplied by Apache. It also has a built-in
extraction engine that does a simpler flat-extract and requires only a
process layer-profile techfile and you're good to go. With CoolTime you
will have to set up their Columbus extraction/tmkr environment and ShowTime
timing env.
The things that make RedHawk easy to set up ultimately limit its capability.
Since RedHawk doesn't have a built in timing engine, it can't do timing
aware VDO (VoltageDropOptimization) operations such as instance spreading
to eliminate hot spots. RedHawk has just added the ability to insert dcap
but relies on your eco-placer to resolve cases where there is not enough
available open spaces around the hot spot . That could result in nasty
eco-place displacements of instances in critical timing paths if you let
RedHawk be too aggressive in placing dcap. CoolTime, on the other hand,
has the ShowTime engine built in and so, in theory (not tested by us yet),
it should be able to add dcap, and move instances, with out hosing your
critical paths.
Another benefit you get with the Sequence tool is that it uses the full-
featured Columbus extractor. This extraction engine handles hierarchical
blocks and can be run n-way MP. In our original chip we were generating
block level spefs using from 4 CPU's. Columbus would then stitch the block
level spefs into a full chip spef for subsequent power analysis.
RedHawk extracts the chip flat with a single CPU. The saving grace is that
since the RedHawk extractor is optimized for power-grid extraction, it runs
VERY fast. More on that later.
The bottom line on setup:
You will be up and running with RedHawk very quickly (assuming you already
have a PT timing flow in place) but will ultimately run into limitations for
more aggressive VDO operations. Setting up CoolTime is more involved, but
gives you a more sophisticated optimization environment and more flexibility
in how the tools are run (flat/hierarchical/uniprocessor/mp).
Run Time, Memory
Comparing the run time between these was difficult as, at the time, we
could only run CoolTime/Columbus on our Sun-Fire-880 (8 CPU, 96Gig Phy mem).
RedHawk ran on our new Opteron for the blocks (~3x the throughput of our Sun)
but we still needed to use the Sun for our full chip runs due to the memory
requirements.
At the block level (~120K instances), CoolTime (Sun) took ~33 minutes to run.
The component parts were: Extraction (9 minutes), STA (2 minutes), dynamic
VDrop (22 minutes).
Apache (Opteron) took ~12 minutes to run. The component parts were:
Extraction (35 seconds), STA (PT on the Sun 6 minutes), reading the results
of PT into RedHawk (1.5 minutes), 40 seconds to do the baseline power calc,
dynamic VDrop (3 minutes).
So, RedHawk's extract was blazing fast, but the overall time was slowed by
dumping out the PT timing window information. PT now runs on the Opteron
so that portion should speed up dramatically. On the other hand, Columbus
also now runs on the Opteron... If we're a little loose and say that, in
general, our Opteron jobs are finishing ~3x faster than our old Sun, then
I'd expect we'd see an 11 minute CoolTime result and a 7-8 minute RedHawk
result. Most of the difference coming from the faster RedHawk extraction.
I should note one nagging RedHawk use issue. Apache forces you to do an
extract for the static analysis, and another extract for the dynamic
analysis. With CoolTime you only extract once and then use the same
grid-spefs for both runs. I have no idea why Apache takes this approach.
At the full chip level this is expensive.
Chip level runs are universally painful. Both CoolTime and RedHawk had to
be run on our Sun as it had the most amount of physical memory. CoolTime
took about 3.5 days to complete the extraction, STA and two sets of
static/dynamic analysis (we did state 1 and state 0 for each). RedHawk took
about 3.0 days. 48 hours of that time was just to dump out the PrimeTime
timing-window information. The Apache PT script has recently been
streamlined so this part of the flow should be measurably improved, but we
haven't benchmarked it yet. I estimate our RedHawk job would take about
30 hours on a large memory, Opteron. We could have reduced the overall time
for CoolTime by increasing the number of CPU's that Columbus used, but would
have been run out of the building by the rest of the team...
In the CoolTime flow you run Columbus, then call CoolTime. The job never
used more than about 20 GB of memory (depending, of course, on how many way
parallel we went during the extract). RedHawk, on the other hand, was a
total memory hog. Our full chip job required about 70 GB. That was broken
up between two processes, a 55 GB RedHawk/extraction process (that idles
during the VDrop computation) and a 15 GB process for computing the VDrop
results. Apache has recently implemented "disk-caching" in RedHawk. This
is supposed to be more efficient than letting the OS page. We'll be
benchmarking that soon.
The bottom line on runtime/memory:
RedHawk is faster at the block level (where Columbus's parallel CPU
capability is not a factor). Both take roughly the same amount of time
on the top level.
CoolTime understands and maintains hierarchy. RedHawk flattens everything.
Plan on buying the biggest, baddest Opteron you can get if you want to run
RedHawk on a large chip.
Debugging/Finding Problems
Sequence formats their reports into HTML web pages that guide you through
the various results/error files. This works fairly well, but rendering
HTML is slow with huge output files. For the full chip runs, we wound up
just diving into the subdirectories directly to get at the raw reports and
side files. The CoolTime GUI was slow, rather clumsy, and seemed to be
an after thought. RedHawk, on the other hand seemed to be written around
the GUI. It is nicely organized and presents a good set of view options
to help you get at the data you need to see.
In general, we don't tend to be GUI driven since most of the flow is
done in a non-graphical batch mode. VDrop analysis, however, usually
requires staring at layout to get a handle on what is the root cause of the
problem being reported. It's one thing to get a report saying you have
300mv drop problem on an instance. It's quite another to know *why* you
have the drop and what to do about it.
Debugging and getting to the root of the problem was easiest to do in
RedHawk. The GUI is fairly fast, though Apache really needs to invest some
more time to speed up the displaying of full-chip data. With a button click
you can jump between layout, EM, power, and VDrop views. You have a virtual
DVM and can probe any layout wires to measure absolute voltages or to see
current densities. RedHawk also has a "what-if" layout tool that lets you
add additional metal/vias then re-analyze within the session to see if
proposed fixes are worthwhile. The "what-if" tool was a bit unstable though
and it often crashed RedHawk while running it...
The bottom line on debugging:
CoolTime will tell you where the problem is. RedHawk will tell you where
the problem is and give you more help figuring out the root cause.
Unfortunately, Apache doesn't license a viewer option for RedHawk. This
will limit our use of it simply because we can't justify paying six-figures
for additional RedHawk licenses to be used as RVE (Mentor's DRC/LVS viewer)
equivalents.
Memories
We weren't able to do more than the basic lib/lef memory modeling for the
eval but will be trying the more advanced modeling capability in the near
future. I'll report on the results when we have some experience with it.
In their most basic mode, both tools take the memory .lib current, and
LEF pins, and distribute the currently evenly over the intersection of the
power grid and the memory pins.
The next level of accuracy, supported by both tools, is to extract the power
grid of the memory gds in order to model feed-through contribution of the
memory power grid super-imposed on your top level power grid. Current sinks
are still placed at the LEF pin/power grid intersections.
At the time of the eval, CoolTime had only one other memory mode. That was
to recognize sections of the memory, mainly by working with the memory
vendors to understand the instance naming, and weight the .lib currents over
the different sections. This a lot better than distributing the currents
evenly, and would probably suffice for our flip-chip bump-powered grid.
If you really want accurate memory modeling, though, Apache wins hands down.
Given their Nspice/circuit roots, they provide the capability to do an all
out extraction and simulation of the memory to generate spice accurate
current waveforms and apply them to the detailed memory power grid.
The bottom line on memories:
CoolTime will get you by but isn't as sophisticated as RedHawk when it comes
to detailed memory modeling. RedHawk could also be used to QA memory IP
used in our design.
Stdcell modeling
Both will let you generate detailed current profiles for the stdcells.
Apache calls this "APL" and Sequence calls it "AcuWave". Both like to make
a big deal about it and boast about "having spice accurate" stdcell power
models. I think it's amusing given the fact that you have this giant
inaccuracy knob called (take your pick): "Toggle-rate", or "PAR", or
"switch-factor", that can change your results by orders...
The bottom line on stdcells:
Both CoolTime and RedHawk can accurately model them.
Overall Summary (IMHO)
Both CoolTime and RedHawk caught our structural grid errors.
Both CoolTime and RedHawk identified legitimate dynamic hot spots, however
there were some areas that showed up in one tool but not the other, and vice
versa...
RedHawk is a nicer tool from a "go in and find what's causing the problem"
point of view.
CoolTime has more potential for automatic fixing of dynamic VDrop issues
(through dcap insertion and instance spreading) and for subsequent timing
optimization in the presence of VDrop given its tight integration with
ShowTime.
CoolTime's memory modeling is more rudimentary than RedHawk's.
RedHawk has a blazing fast extractor, but is a memory hog. Sequence can
easily handle huge, hierarchical designs with multi-way MP capable Columbus
but generally runs slower than RedHawk.
Which tool is better? Depends. They're both good, and bad, in different
ways. We are planning to use both in our current design. CoolTime is well
suited for our batch oriented block analysis and timing aware VDO operations.
RedHawk will be a good top level cross-check and debugging tool used
primarily through its GUI and will give us the ability to model/simulate
memories and complicated I/O structures.
One thing's for sure: After seeing the DRC/LVS legal, self-inflicted,
power-grid problems that were generated by our (and our vendors) various
TCL/Perl/DEF hacks and tools, taping out without running any power analysis
tool would be insane.
- Dan Freitas
Azul Systems Mountain View, CA
Index
Next->Item
|
|