( ESNUG 459 Item 4 ) -------------------------------------------- [12/14/06]

Subject: ( ESNUG 458 #6 ) ye olde Verisity Axis emulator is alive & well!

> Three years ago, we decided to use Axis' (now Cadence) Xtreme emulator
> for pre-silicon verification.  Since then, we have been using it to
> verify more than 10 PowerQUICC microprocessors.  Although our server is
> physically located in Austin, Texas; it's used by Freescale engineers
> on 4 continents (North America, Asia, Europe and Australia) and in 3 
> different time zones (UTC-6, UTC+2, and UTC+9.30).
>
>     - Wai-Chee Wong
>       Freescale Semiconductor                    Hong Kong


From: Thorsten Wermke <thorsten.wermke=user domain=philips spot mom>

Hi, John,

I started using Axis' acceleration (Xtreme Server) in 2003 when it was still
Axis and hadn't yet been acquired by Verisity or Cadence.  We were using
their Xcite 1000 before 2004, but had outgrown it and needed a new system
and went with Xtreme Server.  We built our own homebrew FPGA prototyping and
had a greater need for sim acceleration than In-Circuit-Emulation (ICE),
which was Palladium's strength.  Also, interestingly enough, Xtreme Server
was more compatible with NC-Sim simulator back then than Palladium was.

Axis developed the whole concept of simulation acceleration and the RCC
(reconfigurable computers) technique.  The synthesizable parts of our mixed
Verilog/VHDL code are automatically mapped onto Xtreme Server's FPGAs,
forming mini-compute units that represent the logic.  The non-synthesizable
modules are automatically recognized and compiled to run on their simulator
(Xsim) on the Linux host.

We use Xtreme to verify our ICs for video processing for TV or PCs and MPEG
encoding.  Our chip sizes range from 1 to 10 million gates.  A typical
Xtreme Server compile time is 1.5 hr for 3 million gates.

Our time to set up Xtreme Server varies.  On average it takes us 2 weeks,
but, for example, it only takes 2 days for HDL-based simulation acceleration
with no PLI and no SystemC.  We first determine what is behavioral and can
stay on our simulator and what is synthesizable and can go on the Xtreme
server hardware.   So the real question is actually NOT how long it takes to
get it working, but how long it takes to get it working for a given speed;
e.g. If 10x acceleration is sufficient speed, then it takes very little time
for set up.

If you want to run it in autorun mode (the fastest possible with clock
generation by special hardware in the box), all non-synthesizable modules
have to be replaced by synthesizable ones (including data I/O, which can be
realized nicely using the axis_tbcall primitives).  Using this mode, the
maximum frequency of the fastest clock is probably 300 kHz.  The fastest we
realized was 100 kHz with Axis.

Typically, we see a 10x acceleration over NC-sim for a 50% filled 3 million
gate Xtreme Server with just replacing memories by models using the Xtreme
memory primitives.  It becomes more interesting if you want to accelerate
gate-level netlists.  Gate-level netlists run about the same speed as RTL on
Xtreme, whereas the difference is 10x in a NC simulation.  There is, however,
an impact on the capacity needed.  Gate-level netlists take 30%-100% more
capacity than RTL when mapped on Xtreme Server hardware.

Xtreme also has a great debugging function called "VCD-on-Demand".  In a
normal simulation with millions of interconnects, if we wanted to record
all the states, we would get gigabytes of data and couldn't possibly trace
it all.   But VCD-on-Demand gives us easy, full access to any RTL signal for
the whole simulation-even for long simulations.  Here is the two-step
process:

  1.) We run the entire accelerated sim on Xtreme Server, but only record
      snapshots of the simulation. This runs in full acceleration speed.

  2.) Once we know roughly at which simulation time a bug occurs, we start
      a 2nd simulation around the time when the bug occurred.  In this one
      we turn on full recording of (the RTL signals of) any scope of the
      design.  This is much slower, but has just to be done for a short
      simulation time, so it's OK.

Using this approach we have access to any signal at any simulation time.  We
have used Xtreme Server successfully in the following areas:

 - We identify problems in our homemade FPGA systems but -- due to limited
   traceability -- we can not tell what the reason is.  Then we rerun the
   same thing on Xtreme Server using VCD-on-Demand.

 - We integrate Xtreme Server in our simulation regressions.  Those can
   consist of 800 simulations.  Most of them are run on a Linux-based
   compute farm with NC-sim simulations.  The longest simulations however
   would still run a few days.  We accelerate those on Xtreme server,
   enabling us to run the whole regression overnight.

Where Xtreme Server started to stumble:

Meanwhile, Cadence has improved their NC-Sim a lot, and especially on Linux
boxes, their simulation has become quite fast.  It looks like Cadence hasn't
put much inro improving of Xsim, the old Axis simulator.  Because of this,
we saw the Axis acceleration ratios decrease, making it less attractive.

Also, we failed to use Xtreme Server for software development (mapping the
RTL of our processor onto it and using a JTAG interface).  Compared to our
homebrew FPGA systems, Xtreme Server is still 1000x slower, making it hard
to use for software development, where a lot of iterations are required.  To
be attractive for software development, emulation systems should be able to
run the fastest clock at 10 MHz.

Finally, we just recently started using SystemC, and Cadence should improve
Xtreme Server's compatibility with it.  I think the SystemC support is in
beta, and the code still needs to be written in certain style.

Compared to good old Verisity or Axis times, the field apps engineers who
are responsible to us now are less experienced with the Xtreme Server.  The
Cadence/Verisity/Axis R&D department, however, is still stuffed with a lot
of excellent engineers that know Xtreme Server very well.

Concerning the future, I expect Cadence to have a 'combined' system in 3
years time from now.  And I expect that one to be much more Palladium-like
than Xtreme Server-like.

    - Thorsten Wermke
      Philips Semiconductors GmbH                Hamburg, Germany

         ----    ----    ----    ----    ----    ----   ----

From: Jai Kumar <jai.kumar=user domain=sun spot mom>

Hi, John,

I've worked with Cadence's Xtreme Server acceleration/emulation since its
inception at Axis in early 2001.  We have used it verifying a few ASICs and
our UltraSPARC T1 (with 32 simultaneous threads), which is shipping now.
(BTW, the entire UltraSPARC T1 design is open source, and the entire RTL is
available on the OpenSPARC website for any one to download.)

For our class of processors that we develop at Sun, it takes about 1 to 2
months to set up and port a design, and use Xtreme Server on it.  Our chips
are heavily based on VCS.  We use simulation acceleration in the project's
early phases, and shift to targetless emulation as the design matures to
obtain the highest speed-up, which are mandatory for really long sims.

The best speed we have been able to get with Xtreme Server is 100 KHz.  That
is 1000x the speed over VCS.  For our UltraSPARC T1 product, we ran really
long directed and random generated self-checking tests.  We also booted
firmware like OBP and Solaris OS.  For example, the Solaris boot takes
3-7 days (for various configurations).

Pros:

  1. Xtreme Server has a closely coupled software simulator, so we can bring
     up the design quickly in software without needing to massage the data.
     Xtreme does have some nice features for debug.  For example, they have
     VCD on Demand so that we don't have to rerun the simulation.  The sim
     is restored from a checkpoint to help you generate the signal waveforms
     needed for debug.

  2. Cadence has a command to automatically do a 'hot swap'.  We can:

       - Initialize our design using (non-synthesizable) PLI.
       - Swap the simulation state to hardware.
       - Run Xtreme Server at full speed.

  3. 'Suspend and Resume' helps us maximize our use of Xtreme Server to keep
     our overall costs down.  With 'Suspend and Resume', we can:

       - Run a long job like Solaris boot.
       - Suspend the Solaris boot, interject a high priority short debug run.
       - After the short jobs, go back to running Solaris.

     This lets us run the smaller job right away rather than hold it up for
     days while we wait for the long job to be completed.

Cons:

  1. Xtreme Server can't be expanded beyond a 35 million gates capacity.
     This max capacity is 35 M gates for everything you want to have in the
     emulator HW.  This includes RTL design, TB, all assertions, debug
     monitors, etc.  Because our designs are growing faster than this, we
     cut some corners in what we verify...  e.g. we eliminate some cores.

  2. Model compilation on Xtreme Server is slow for really big designs.  It
     takes almost a day to turn a big model for our 35 million gates designs
     due to the number of FPGAs that need to be launched.

We can boot our Solaris OS in 3-5 days on Xtreme HW , while it would take us
years to run in VCS.  Because we can validate both our HW and SW before tape-
out, we are able to cut our product development time.

    - Jai Kumar
      Sun Microsystems                           Sunnyvale, CA
Index    Next->Item








   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)