( ESNUG 482 Item 7 ) -------------------------------------------- [06/30/09]

From: Andy Vagg <andy.vagg=user domain=nxp not balm>
Subject: Mentor Veloce cranks video data 3X faster than on a VStation

Hi, John,

We evaluated the Mentor Veloce emulator in January 2007 and purchased it in
April 2007.  Prior to using Veloce we used Mentor VStation for 4 years.

We use Veloce for our digital TV (SD and HD) and digital set-top box chips.
Our design sizes vary from 5 M gates up to 20 M ASIC gates.

Since we moved to Veloce emulation, we reduced our typical overall system
development time 17% to 25%.

Completely new SOC using

                      VStation:   24 months
                        Veloce:   18 months

For derivative designs

                      VStation:   18 months
                        Veloce:   15 months

Our eval criteria and Mentor Veloce's results were as follows:

  1. Minimum ASIC gate count of system 20 M gates.
     - Exceeded criteria. Veloce Quattro (16 boards) supports 100 M gates.

  2. Support for a minimum of 10 asynchronous clock domains supported.
     - Exceeded criteria.  Veloce handles 15 asynchronous clock domains.

  3. ICE speeds minimum 750 kHz.
     - Exceeded criteria.  Veloce could handle 1 MHz.

  4. Support for DDR and DDR2 SDRAM system memories.
     - Met criteria.

  5. In-Circuit Emulation (ICE) support, with multiple external targets
     for use with the emulation when we want a real hardware interface
     rather than a software model.
     - Met criteria.

  6. Support external targets like PCI, Multimedia (HDMI), DDR, and SDRAM.
     - Met criteria.  Veloce supports HDMI, PCI-64 and DDR and DDR2-SDRAM
       memories.  DDR memory has a high-speed "back-door" access capability
       where we can upload or download GB of data in minutes without
       needing to run emulation cycles.

  7. Backwards compatibility with existing ICE targets such as our I2C,
     which we were already running with VStation.
     - Met criteria.

  8. Verilog and VHDL mixed language RTL support.
     - Exceeded criteria, plus strong System Verilog support.

  9. Support for gate-level netlist import.
     - Met criteria.

 10. Customer Support both on-site and off-site.
     - Met criteria.  Mentor has strong support.


Veloce vs. VStation benchmarks

Our eval design (PNX8335-M0) was a 6 M gate digital TV chip.

Setting up Veloce:

It took us about 2-3 weeks to set up the basic functionality, which includes
I2C, PCI, DDR, and UART interfaces.  We needed extra time to set up our
environment as new RTL became available for additional interfaces on the
System-on-Chip; each interface takes from 1-10 days depending upon complexity
and how much additional modeling is required.

Capacity:

  VStation - Our 6 Million gate design consumed 76% of VStation's total
  capacity, across 7 logic boards.

  Veloce Quattro -  We used only 69% of the capacity on 2 advanced
  verification boards for the same 6 M gate design.

Compilation times:

  PNX8335-M0              VStation       Veloce     speed-up

  CPUs used
  for compilation         40              20

  RTLC Step               Precompiled     11 min     - 
  Vsyn/Velsyn              29 min         28 min     1x
  P&R                      90 min          4 min     22.5x
  TOTAL time              119 min         41 min     3x

Veloce compiler was 3x faster overall than VStation.  It also compiled on
fewer CPUs, so it reduced our required resources significantly.

Runtime Performance:

  PNX8335-M0              VStation       Veloce     speed-up

  UART                    677 kHz        1,449 kHz   2.1x
  CPU                     250 kHz          813 kHz   3.2x
  MEM1X                   142 kHz          408 kHz   2.8x
  MSVD                    250 kHz          813 kHz   3.2x

  Design loading
  (incl. mem pre-load)   1 min 33 s        38 s      2.5x 

  Memory loading            9 s           3.5 s      2.5x
  (incl. Linux boot
  code 16MB)               18 s           5.5 s      3.0x

  Register Reset         6 min 40 s        25 s       16x

  Register R/W          25 min 27 s      1 min 30 s   17x

  Raw TS                13 min 55 s      7 min 22 s  1.9x

  Linux video test
  code copy flash
  to DDR                11 min 56 s      3 min 40 s  3.3x

  Linux boot           failed to boot    8 min 40 s

Overall Veloce's design loading was roughly 2.5X faster than VStation.


Multi-user:

Veloce has a multi-user capability that was not available in VStation.  We
currently use 6 chassis of Veloce in a 16-user configuration, 24/7 across
the world, from Europe, India, and US.  Our worldwide development centers use
Veloce locally and remotely for IP sub-system verification within the SoC as
well as verification of the entire SoC, along with the creation of silicon
validation scripts and software development.

Debug:

Veloce's debug environment was similar to debugging with a simulator such as
Cadence NC-Sim.

We can annotate values from the waveform viewer to the path browser, which
really helps with the debug phase.  For waveform capture we use triggers and
then do a post-process debug.  This is fast, e.g. 42 seconds to upload a full
visibility database, then a 4 min replay task, and then just 57 seconds to
view a set of selected signals (including scalars and vectors) over a full
64 K cycles of captured data.

Waveform view performance:

Time to visibility, meaning to upload waveforms and view some signals, was
10X faster on Veloce than VStation.  The trace depth was 64 K clock cycles.

                          VStation       Veloce     speed-up

  User clocks               40 K           64 K
  Upload                 1 min 24 s        42 s        2x
  State Reply           49 min 45 s       4 min     12.5x
  Display                1 min 25 s        57 s      1.5x
  Total                 52 min 34 s     5 min 39 s   9.3x

Veloce had predictable, repeatable compiles.  Additionally, good graphical
interfaces.  By adapting the speed of live data down to emulator speeds (GHz
down to MHz speed adaption), we could do our testing with actual data for an
accurate representation of the real test data we would use against silicon.


Some of Veloce's video specific abilities:

  - Can generate specific video formats from files such as RGB, CCIR656
    [YUV], and HDMI data (e.g. .avi files) and exercise the DUT.

  - Veloce has graphical back end analysis tools we use to analyze and debug
    pixel-level videos.  Each format Mentor supports (RGB, YUV, HDMI, DVI,
    and DisplayPort) has specific data associated with it that we can view
    or extract and analyze.  Examples are pixel data values, picture
    resolutions, frame sizes, horizontal and vertical scanning positions,
    front and back porch information, and sync signal widths.  All these
    operations are possible on-the-fly using the tools, so we waste no time
    post-processing data.

  - iSolve Multimedia spectrum analysis tools for audio provide us with
    information on formats used, frame numbers, audio clock frequencies, and
    amplitudes, plus specific data associated with these audio formats
    including I2S and S/PDIF.

  - Latest HDMI specs (v1.3) in and out are supported, including deep color
    support.  This is lacking in Cadence Palladium and EVE.

  - Very good Ethernet support, especially for use with a live network
    connection to an emulated design.  Mentor supports all required Ethernet
    standards, including 10M/100M, 1G, 10G Ethernet.  Additionally, Veloce
    has a good Ethernet packet analyzer to detect errors in the packets
    to/from the design in the emulator.  It is simple to plug into an
    Ethernet socket (RJ45) and start generating Ethernet packets and stream
    them into the design.

  - For RAM, Veloce has a graphical/command line for configuring memory type
    and size, and the ability to perform high-speed download/upload of data
    from/to screen/file.  A 128 MB download took 8 seconds and 128 MB upload
    took 13 seconds.

Additional non-video capabilities:

  - Multiple clock domain support.  This lets us catch bugs such as clock
    domain crossing issues. These are especially useful where we have bridges
    between some buses that use different clock domains.

    EVE, Palladium emulators don't support several clock domains.

  - Clock model accuracy. We created a synthesizable PLL model that includes
    fractional wrappers and spread spectrum capabilities.  The wrappers allow
    us to vary the frequencies to match what we use on silicon.  We have
    clock monitors that allow us to check these frequencies, and they are
    accurate to 0.1%.  The clock model accuracy allows us to make accurate
    bandwidth and latency measurements on our SoCs.


Overall

Our software developers were able to develop and test drivers and stacks for
a real flash device on a target board connected to Veloce. Also, using a
Mentor speed adapter for Ethernet, we used a live network connection with an
emulated design running in Veloce.  We had already thoroughly tested the
interfaces on the emulator.

We also used RTL-based monitors in conjunction with software code to measure
system bandwidth and latency where software applications were stress-testing
the design. This allowed us to do 5 rounds of software optimizations before
going to silicon.

Our software developers in India used Allegro compliancy test suites for
both standard and high-definition H.264 decode testing.  With Veloce, we ran
over 3000 frames over a weekend for our customer test streams.

Finally, our HDTV "Natural Motion" software developers were able to take a
24 frames/sec stream with vertical and horizontal panning, which included
a moving object, and analyze their algorithms' behavior to make sure it
sharpened the background images and interpolate (by inserting frames) the
moving object correctly, to give smooth jitter and halo free motion.

We have caught several bugs using Veloce on multiple projects.  On the last
project, we caught 10 critical bugs -- all of which required engineering
changes and could have led to respins or software changes.  These related to
a video compression block, and a bus bridge block.  We regularly use Veloce
to catch clock-domain crossing and race condition issues.


For Veloce's minuses I would like to see Mentor add an analog waveform tool
in so we don't need to buy Debussy.  Also, increased waveform depths on the
box would be nice, since data for video is very large.

Veloce lets our software development teams verify at an earlier stage so we
could run concurrent software and hardware together.  We have fewer design
re-spins and are seeing shortened product design cycles with Veloce (vs.
VStation).  Because of Veloce, on the last 3 projects, we provided samples
to our lead customer within 2 weeks of receiving silicon.

    - Andrew Vagg
      NXP Semiconductors                         Southampton, UK
Join    Index    Next->Item
















   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)