( ESNUG 458 Item 6 ) -------------------------------------------- [11/16/06]

Subject: ( ESNUG 454 #17 ) Our bug hunt using the Verisity/Axis emulator

> Depending on the type of design and the verification environment, Cadence
> Xtreme's performance ranges from 10 kHz to 140 kHz for a transactional
> SCEMI environment and around 200 kHz for In Circuit.  We consider Xtreme
> more of an accelerator (close to a simulator) than a pure emulator...
> Today, we are able to map a design onto Xtreme as soon as the first
> simulation is working on a simple simulator (e.g. NC-Sim/ModelSim/VCS).
> In general, this takes from one day to a few days.
>
>     - Tran Nguyen
>       STmicroelectronics                         Grenoble, France


From: Wai-chee Wong <wai-chee.wong=user domain=freescale got calm>

Hi, John,

Three years ago, we decided to use Axis' (now Cadence) Xtreme emulator for
pre-silicon verification.  Since then, we have been using it to verify more
than 10 PowerQUICC microprocessors.  Although our server is physically
located in Austin, Texas; it's used by Freescale engineers on 4 continents
(North America, Asia, Europe and Australia) and in 3 different time zones
(UTC-6, UTC+2, and UTC+9.30).


Our Debugging Headache:

Debugging at the chip level is one of our biggest headaches.  For a chip in
our PowerQUICC family, we have one or more CPU cores, sometimes a couple of
DSP cores, and a handful of memory controllers for DDR, Flash, NAND-Flash
and SDRAM.  We also have a mix of communications peripherals thrown in like:
Parallel or Serial RapidIO, PCI, PCI Express, HyperTransport, ATM, USB and
Gigabit Ethernet, low speed communications peripherals such as I2C, Uart,
FIRI and 10/100 Ethernet, display devices (like a LCD controller), and
timers, counters, DMA, and baud rate generators.

Our first problem is simulation run time.  Our typical design is huge and it
takes a long time to simulate a test case.  This situation is even worse if
we have a gate level design.  To illustrate my point, here are two examples I
encountered in the past, both are run in simulation mode without waveform
generation:

   - verifying the Power-On-Reset (POR) sequence of a 10 M gate design at
     gate level takes 55 minutes.

   - verifying the link training state machines of Serial RapidIO (SRIO)
     takes about 20 minutes to train the SRIO devices up to x4 mode, and
     more than 60 minutes to x1 mode.

With waveform generation, a normal 32-bit 4 GB Linux box would complain it's
out of memory, so I am forced to use a 64-bit 16 GB Linux box, and I'll
still need to wait for a few hours before waveform generation completes.  It
is painful in the later stages of debug when I find the waveform is too
short, or it doesn't contain the signals I am interested in.

This is where a compute farm can NOT beat emulation.  You can buy 50 Linux
boxes for the price of one emulator, but you can't get the same productivity
when it comes to run and debug lengthy operations like: MBIST or ABIST
testing, link training for SRIO and PCI-Express, graphic card display at a
resolution of 1024x1280 at a rate of 25 frame per second, AES encryption
with a 256 bits key, etc.  We have run into all of these scenarios in the
course of PowerQUICC family verification.  Without emulation, we could only
run these sequences at system level only once or twice before tape out.  By
porting to emulation, we get a 5000-10,000x speed over SW simulation, and
are able to include these sequences as part of our random test suites.


Xtreme Waveform Generation:

Xtreme does waveform generation through its VCD-on-Demand (VoD) mechanism.
When we run emulation for the first time, we turn on recording and it
periodically captures the states of the Xtreme emulator.  Checkpoint files,
or record files, are obtained at the end of the simulation.  The record
files are small, so you have little concern of their impact on disk space.
Later on we replay the record files to generate the waveform in a viewable
format.  Four different modes of VoD are available: "hwrpd", "swrpd",
"regrpd" and "iorpd".  The choice depends on the type of signals we want
to see.

    1. hwrpd mode -- Normally people use hwrpd mode to generate waveforms
       because all signals are available.  It is the most accurate form of
       waveform generation because all signals comes from the emulator.
       However, it is also the slowest one.

    2. swrpd mode -- In hwrpd, all the signals in the waveform come from
       the emulator.  In swrpd, some of the signal values (for example,
       wires) are actually calculated by the host machine in simulation.
       It is faster than hwrpd because there is less data transfer between
       the host and the emulator.

    3. regrpd mode -- If we are only interested in register, primary inputs
       and outputs, we chose the "regrpd" mode.  Wires are not available in
       the waveform, but it is much faster than hwrpd and swrpd.

    4. iorpd mode -- The fastest mode of VoD is iorpd, and it can be used to
       generate waveforms at emulation speed.  However, our use of this mode
       is not very common because it only contains primary inputs, primary
       outputs and signals exported via the $export_read() system task.

The first step to do a waveform generation is to enable VoD when you run a
simulation.  It is only a one-liner when you invoke the emulator.  Once the
record files are created, you can replay the files to generate a waveform
using one of the 4 modes mentioned above.  Similar to video recording, you
can also fast forward and rewind in the record files during waveform
generation.

To get a waveform from the record files, a command similar to $dumpvars() is
used.  The simplest form to use this command is:

    $dumpvars; // that means to dump everything

Unless the test case changes, it is not necessary to rerun the same test
again.  The use of a record file reduces the number of uncontrollable,
unrepeatable variants in your verification environment, such as the engineer
who is sending stimulus to the emulator over an in-circuit interface.  We
don't waste time reproducing the same test scenario.  From the record files,
you can reconstruct the complete picture of how a test fails.


When Our Bug Hunting Process Went Haywire:

Some of our engineers are afraid of emulation because it is too fast.  I had
a similar experience when I used Xtreme for the first time.   We were trying
to run Das U-boot from GNU in the emulator.  After 2 hours of emulation, we
did not see ANY message from the UART terminal!?  Huh?!  Two hours emulation
means billions of cycles and millions of instructions!  After we confirmed
that we should have seen the U-boot banner over the UART terminal, we tried
to get a waveform in "hwrpd" mode.  However, it was way too slow.  Also, we
were not sure if we wanted to browse through millions of instructions, so we
went back to redesign our testbench.

First, we decided to put in synthesizable HDL monitors.  It is very simple
to write a monitor for Xtreme because it supports standard system tasks like
$display() and $stop().  For example, we wanted to know if the core branches
to a different instruction stream, and where it branches to.

    `define core testbench.chip.core
    module my_monitor;
       always @(posedge `core.clk)
        if (`core.is_branch_instruction == 1'b1)
            begin // axis tbcall_region
                $display ("%t: Next PC = %h\n, $time, `core.next_pc);
            end
    endmodule

This monitor splits out the program counter of the next instruction whenever
the core is about to execute a branch instruction.  Xtreme supports cross
module references, so the only thing I needed to do was to rebuild the model
with the +rtlxmr option:

    % xsim +rtlxmr my_monitor.v ...

We also put in a monitor so we would know if the core starts executing code
from the DDR memory, and another monitor so we would know if a given memory
mapped register is modified.

Once the monitors were in place, we were able to track when the execution of
U-boot went haywire.  This happened after we ran emulation for about 10
minutes.  The core then executed an illegal opcode.  We narrowed down the
scope from 2 hours to 10 minutes, but it was still too large to generate a
waveform.

We discussed the issue with Cadence's Xtreme AE, and discovered a neat
Xtreme feature that would solve our problems.  We went through the design
and short-listed thousands of signals that were critical to our debug
effort.  These signals are visibility points.  Examples are the program
counter and general purpose registers in the core, bus signals of the
backbone interconnects, clock and reset signals, pins, interrupts and all
major memory interfaces.

We believed that signs of the bug's existence would eventually be seen in
one of our visibility points.  We exported these critical signals with
$export_read().  The result was a unique way of generating waveforms at
emulation speed (iorpd mode) with our predefined visibility points.

With iorpd, we were able to use waveform to further narrow down the area of
question.  We spotted a bus error on the core bus and began to suspect that
U-boot may not have a correct configuration for the platform.  With this
extra piece of information, we generated waveforms in "hwrpd" mode, our
final weapon to locate the source of error.  We specified the window of time
to be around the bus error, and we specified the scope, or the hierarchical
paths, to be around the suspected platform sub-system.  It may take a few
trial and error runs to get the correct scope and window of time, but we
didn't need to rerun the test.  The record file had already captured the
states of the emulator when we ran the test the first time.

We finally reached the source of error.  In U-boot there is a routine called
"udelay", which uses the watchdog timer in the core to mimic the elapse of a
micro-second.  This routine was not working because the testbench did not
provide an accurate real time clock to the device.  The bus error resulted
when U-boot tried to access a memory location before the prior configuration
completed.  This surprised us -- after all, it is legal to use a watchdog
timer with an inaccurate real time clock!  But once we set the real time
clock, U-boot worked.

A good Xtreme debug analogy would be that the synthesizable HDL monitors
serve as our radar, iorpd is our telescope, while hwrpd is our microscope.
There are 4 parts that make this work.

 1. It is extremely easy to develop monitors in Xtreme.  It supports cross
    module reference and system tasks, with a user interface that is very
    simple and straight forward.  A monitor can be written in only 8 lines.

 2. You can build a simulation model with Xtreme from the same database.  A
    simulation model takes only 5 minutes to build, as compared to 1-2 hours
    for an emulation model.  We are able to verify the monitors in
    simulation before we kick off the emulation build.

 3. The ability to generate a waveform for a set of signals at emulation
    speed.  While the HDL monitors are a light in the dark, the iorpd
    waveform with our pre-selected critical signals works like a map with
    only landmarks.  It may not be good enough for travelers to locate
    their camps, but it definitely helps users to reach their destinations.

 4. hwrpd lets us find the fine final details of the bug.

This process has now been adopted in our organization and is still being
used today.

    - Wai-Chee Wong
      Freescale Semiconductor                    Hong Kong
Index    Next->Item







   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)