( ESNUG 482 Item 7 ) -------------------------------------------- [06/30/09]
From: Andy Vagg <andy.vagg=user domain=nxp not balm>
Subject: Mentor Veloce cranks video data 3X faster than on a VStation
Hi, John,
We evaluated the Mentor Veloce emulator in January 2007 and purchased it in
April 2007. Prior to using Veloce we used Mentor VStation for 4 years.
We use Veloce for our digital TV (SD and HD) and digital set-top box chips.
Our design sizes vary from 5 M gates up to 20 M ASIC gates.
Since we moved to Veloce emulation, we reduced our typical overall system
development time 17% to 25%.
Completely new SOC using
VStation: 24 months
Veloce: 18 months
For derivative designs
VStation: 18 months
Veloce: 15 months
Our eval criteria and Mentor Veloce's results were as follows:
1. Minimum ASIC gate count of system 20 M gates.
- Exceeded criteria. Veloce Quattro (16 boards) supports 100 M gates.
2. Support for a minimum of 10 asynchronous clock domains supported.
- Exceeded criteria. Veloce handles 15 asynchronous clock domains.
3. ICE speeds minimum 750 kHz.
- Exceeded criteria. Veloce could handle 1 MHz.
4. Support for DDR and DDR2 SDRAM system memories.
- Met criteria.
5. In-Circuit Emulation (ICE) support, with multiple external targets
for use with the emulation when we want a real hardware interface
rather than a software model.
- Met criteria.
6. Support external targets like PCI, Multimedia (HDMI), DDR, and SDRAM.
- Met criteria. Veloce supports HDMI, PCI-64 and DDR and DDR2-SDRAM
memories. DDR memory has a high-speed "back-door" access capability
where we can upload or download GB of data in minutes without
needing to run emulation cycles.
7. Backwards compatibility with existing ICE targets such as our I2C,
which we were already running with VStation.
- Met criteria.
8. Verilog and VHDL mixed language RTL support.
- Exceeded criteria, plus strong System Verilog support.
9. Support for gate-level netlist import.
- Met criteria.
10. Customer Support both on-site and off-site.
- Met criteria. Mentor has strong support.
Veloce vs. VStation benchmarks
Our eval design (PNX8335-M0) was a 6 M gate digital TV chip.
Setting up Veloce:
It took us about 2-3 weeks to set up the basic functionality, which includes
I2C, PCI, DDR, and UART interfaces. We needed extra time to set up our
environment as new RTL became available for additional interfaces on the
System-on-Chip; each interface takes from 1-10 days depending upon complexity
and how much additional modeling is required.
Capacity:
VStation - Our 6 Million gate design consumed 76% of VStation's total
capacity, across 7 logic boards.
Veloce Quattro - We used only 69% of the capacity on 2 advanced
verification boards for the same 6 M gate design.
Compilation times:
PNX8335-M0 VStation Veloce speed-up
CPUs used
for compilation 40 20
RTLC Step Precompiled 11 min -
Vsyn/Velsyn 29 min 28 min 1x
P&R 90 min 4 min 22.5x
TOTAL time 119 min 41 min 3x
Veloce compiler was 3x faster overall than VStation. It also compiled on
fewer CPUs, so it reduced our required resources significantly.
Runtime Performance:
PNX8335-M0 VStation Veloce speed-up
UART 677 kHz 1,449 kHz 2.1x
CPU 250 kHz 813 kHz 3.2x
MEM1X 142 kHz 408 kHz 2.8x
MSVD 250 kHz 813 kHz 3.2x
Design loading
(incl. mem pre-load) 1 min 33 s 38 s 2.5x
Memory loading 9 s 3.5 s 2.5x
(incl. Linux boot
code 16MB) 18 s 5.5 s 3.0x
Register Reset 6 min 40 s 25 s 16x
Register R/W 25 min 27 s 1 min 30 s 17x
Raw TS 13 min 55 s 7 min 22 s 1.9x
Linux video test
code copy flash
to DDR 11 min 56 s 3 min 40 s 3.3x
Linux boot failed to boot 8 min 40 s
Overall Veloce's design loading was roughly 2.5X faster than VStation.
Multi-user:
Veloce has a multi-user capability that was not available in VStation. We
currently use 6 chassis of Veloce in a 16-user configuration, 24/7 across
the world, from Europe, India, and US. Our worldwide development centers use
Veloce locally and remotely for IP sub-system verification within the SoC as
well as verification of the entire SoC, along with the creation of silicon
validation scripts and software development.
Debug:
Veloce's debug environment was similar to debugging with a simulator such as
Cadence NC-Sim.
We can annotate values from the waveform viewer to the path browser, which
really helps with the debug phase. For waveform capture we use triggers and
then do a post-process debug. This is fast, e.g. 42 seconds to upload a full
visibility database, then a 4 min replay task, and then just 57 seconds to
view a set of selected signals (including scalars and vectors) over a full
64 K cycles of captured data.
Waveform view performance:
Time to visibility, meaning to upload waveforms and view some signals, was
10X faster on Veloce than VStation. The trace depth was 64 K clock cycles.
VStation Veloce speed-up
User clocks 40 K 64 K
Upload 1 min 24 s 42 s 2x
State Reply 49 min 45 s 4 min 12.5x
Display 1 min 25 s 57 s 1.5x
Total 52 min 34 s 5 min 39 s 9.3x
Veloce had predictable, repeatable compiles. Additionally, good graphical
interfaces. By adapting the speed of live data down to emulator speeds (GHz
down to MHz speed adaption), we could do our testing with actual data for an
accurate representation of the real test data we would use against silicon.
Some of Veloce's video specific abilities:
- Can generate specific video formats from files such as RGB, CCIR656
[YUV], and HDMI data (e.g. .avi files) and exercise the DUT.
- Veloce has graphical back end analysis tools we use to analyze and debug
pixel-level videos. Each format Mentor supports (RGB, YUV, HDMI, DVI,
and DisplayPort) has specific data associated with it that we can view
or extract and analyze. Examples are pixel data values, picture
resolutions, frame sizes, horizontal and vertical scanning positions,
front and back porch information, and sync signal widths. All these
operations are possible on-the-fly using the tools, so we waste no time
post-processing data.
- iSolve Multimedia spectrum analysis tools for audio provide us with
information on formats used, frame numbers, audio clock frequencies, and
amplitudes, plus specific data associated with these audio formats
including I2S and S/PDIF.
- Latest HDMI specs (v1.3) in and out are supported, including deep color
support. This is lacking in Cadence Palladium and EVE.
- Very good Ethernet support, especially for use with a live network
connection to an emulated design. Mentor supports all required Ethernet
standards, including 10M/100M, 1G, 10G Ethernet. Additionally, Veloce
has a good Ethernet packet analyzer to detect errors in the packets
to/from the design in the emulator. It is simple to plug into an
Ethernet socket (RJ45) and start generating Ethernet packets and stream
them into the design.
- For RAM, Veloce has a graphical/command line for configuring memory type
and size, and the ability to perform high-speed download/upload of data
from/to screen/file. A 128 MB download took 8 seconds and 128 MB upload
took 13 seconds.
Additional non-video capabilities:
- Multiple clock domain support. This lets us catch bugs such as clock
domain crossing issues. These are especially useful where we have bridges
between some buses that use different clock domains.
EVE, Palladium emulators don't support several clock domains.
- Clock model accuracy. We created a synthesizable PLL model that includes
fractional wrappers and spread spectrum capabilities. The wrappers allow
us to vary the frequencies to match what we use on silicon. We have
clock monitors that allow us to check these frequencies, and they are
accurate to 0.1%. The clock model accuracy allows us to make accurate
bandwidth and latency measurements on our SoCs.
Overall
Our software developers were able to develop and test drivers and stacks for
a real flash device on a target board connected to Veloce. Also, using a
Mentor speed adapter for Ethernet, we used a live network connection with an
emulated design running in Veloce. We had already thoroughly tested the
interfaces on the emulator.
We also used RTL-based monitors in conjunction with software code to measure
system bandwidth and latency where software applications were stress-testing
the design. This allowed us to do 5 rounds of software optimizations before
going to silicon.
Our software developers in India used Allegro compliancy test suites for
both standard and high-definition H.264 decode testing. With Veloce, we ran
over 3000 frames over a weekend for our customer test streams.
Finally, our HDTV "Natural Motion" software developers were able to take a
24 frames/sec stream with vertical and horizontal panning, which included
a moving object, and analyze their algorithms' behavior to make sure it
sharpened the background images and interpolate (by inserting frames) the
moving object correctly, to give smooth jitter and halo free motion.
We have caught several bugs using Veloce on multiple projects. On the last
project, we caught 10 critical bugs -- all of which required engineering
changes and could have led to respins or software changes. These related to
a video compression block, and a bus bridge block. We regularly use Veloce
to catch clock-domain crossing and race condition issues.
For Veloce's minuses I would like to see Mentor add an analog waveform tool
in so we don't need to buy Debussy. Also, increased waveform depths on the
box would be nice, since data for video is very large.
Veloce lets our software development teams verify at an earlier stage so we
could run concurrent software and hardware together. We have fewer design
re-spins and are seeing shortened product design cycles with Veloce (vs.
VStation). Because of Veloce, on the last 3 projects, we provided samples
to our lead customer within 2 weeks of receiving silicon.
- Andrew Vagg
NXP Semiconductors Southampton, UK
Join
Index
Next->Item
|
|