( ESNUG 461 Item 14 ) ------------------------------------------- [01/31/07]
Subject: ( ESNUG 454 #18 ) EVE ZeBu-XL is 10,000X faster than VCS/NC-Sim
> Loading an existing model into the Zebu box takes 5 minutes. Building a
> 32 FPGA model takes about 10 hours, 4 hours for synthesis, 4 hours for
> FPGA place and route, and 2 hours for the Zebu tools (logic insertion,
> partition and final merge). A 16 FPGA design might take 6 hours.
>
> - Mike Dickman
> Palo Alto Semiconductor Santa Clara, CA
From: [ The Aflac Duck ]
Hi, John,
We used EVE's ZeBu-XL emulator on an image processing design, approx
8 million gates. We used a transaction-based testbench and achieved
~10,000X speed improvement over simulation, or ~1/200 of real-time speed.
For example, running 1 second worth of HD video (60 frames) would take
about 3 1/2 minutes, while VCS/NC-Sim would take about a month.
We found ZeBu's C/C++ co-emulation mode worked very well. It allowed
us to create a "light" testbench (with minimal processing involved),
to get the maximum speed from the emulator.
ZeBu-XL was an excellent platform for hardware/software integration.
The software could run fast enough to provide usable, realistic results
in acceptable time. This speed made an enormous difference in letting
us close our software debugging loop (debug, recompile, rerun).
With NC-Sim/VCS, we found the typical cycle would be effectively 1 day for
simulation, another day to re-run the simulation with the appropriate
signals dumped, a couple of hours to isolate the problem and correct it
(software or hardware) and then try it again. So our loop was about 3 days.
With ZeBu, the initial run would be maybe 1/2 an hour. The debug would
likely require a couple of tries, as the signal dumping has to be done more
carefully so as not to affect run time too much. If the problem was in
hardware, we'd have to resynthesize the chip for Zebu. All told, our loop
dropped to about 3-6 hours (depending on if the problem was with hardware
or software) instead of 3 days.
Drawbacks:
ZeBu does not allow for a large number of signals to be traced for any
appreciable amount of time without affecting the run speed, and these
signals (static probes) need to be specified at compile time. If more
signals are needed, or signals that were not specified at compile time,
then dynamic probes have to be used. These slow the emulator down to
simulator speeds. However, we found that if we used the static probes on
common timing signals, we could trigger at very specific times and then
switch to using dynamic probing for the details. Because of the speed of
ZeBu, we could re-run more frequently to narrow our search quickly."
EVE's compilation flow for ZeBu-XL is still developing. The automatic
flow works, but we found we could easily get 2X run speed improvement if
we manually intervened in such things as partitioning. As the tools
continue to develop, we expect they will get to the point where manual
intervention is only required in extreme cases. We have not tried
ZeBu's new RTL front end, which allows RTL to be mapped directly into
the emulator, without having to synthesize as a separate step.
Typically, the complete (non-incremental) synthesis and compile took
about 3-4 hours. The incremental compile was unreliable, though we have
been told this has been corrected in the newer tools. I have no reason
to doubt this.
The original ZeBu API did not support all of the functionality of the
command line or GUI interfaces, so it was difficult to automate some tasks.
However, the new API has addressed this.
The EVE support for multiple clocks is limited and we had to work around
this. I should note we had to do the same work for the Cadence and Mentor
emulators we evaluated - though not necessarily for the same reasons.
All emulators simulate event ordering on the primary clocks. Basically,
there is an internal clock on which all events occur. In order to
guarantee event ordering, only one clock edge of one clock will be active
on an edge of the internal clock. This becomes a problem when your design
has a number of high frequency clocks of similar frequency.
In our case, this slowed the ZeBu system down by a factor of about 4.
Because the clocks we were trying to emulate were effectively asynchronous,
we were not so concerned about the exact ordering of the edges, but more
about the frequency relationships between the clocks. So, we used only a
single primary clock and used accumulator based dividers to generate
internal clocks. In addition, ZeBu has a hard limit of 8 on the number of
primary clocks and we needed more, so we had to come up with this solution
regardless.
I wish I could remember why we needed to do this with Palladium, but I can't
find where we documented that. Sorry.
Upsides:
ZeBu-XL's biggest strengths are, to paraphrase real estate agents, speed,
speed, speed! For hardware/software integration, this is absolutely
critical. While Cadence and Mentor may provide better debug functionality,
there is still the need to anticipate where and when bugs will occur. If
you guess wrong, you have to re-run the tests, and the speed of the ZeBu
makes it much, much faster to re-run and get to the point of interest.
We found the Palladium to be primarily geared towards in-circuit emulation,
so the transaction based support was not as efficient or easy as the other
two. As such, the Palladium run speed for our evaluation was slower than
the others.
The Mentor vStation was a good tool with much more powerful debug support
than ZeBu, but was consistently slower, which made it less appropriate for
software/hardware co-verification.
Also, EVE's post-sales support has been very good; we are more than happy
with our decision to purchase ZeBu.
- [ The Aflac Duck ]
Index
|
|