CDNS says Hogan missed 5 metrics/gotchas for picking an emulator

( ESNUG 532 Item 4 ) -------------------------------------------- [09/06/13]

From: [ Frank Schirrmeister of Cadence ]
Subject: CDNS says Hogan missed 5 metrics/gotchas for picking an emulator

Hi, John,

In ESNUG 522 #3, Jim described 14 metrics/gotchas that he felt a design team
must consider when picking what emulator to use on their project (if any).

After I had lunch with Jim we were up to 19.  Here they are, sorted along
my three marketing categories axes of productivity, predictability and
versatility:

Productivity

  1. Initialization and Dedicated Support - available for emulation,
     less for FPGA based prototyping
  2. Capacity -- Palladium leads as per above
  3. Speed Range -- Palladium leads especially for big designs for
     which FPGA based emulation slows
  4. Number of Users -- Palladium 4x better over Veloce
  5. Primary Target Designs (i.e. granularity) -- Palladium 4x finer
     over Veloce
  6. Visibility -- Palladium 2.5x better trace depth and great
     upload speed
  7. Debug - Palladium 2.5x better trace depth for full vision

Predictability

  8. Compile Time -- Predictable in Palladium vs. power hungry
     server farms for Veloce and Zebu
  9. Time of availability -- Fast compile make processor based
     emulation applicable earlier
 10. Partitioning -- Predictable in Palladium vs. messy in FPGA
     based emulation
 11. Memory Capacity - Predictable in Palladium vs. re-mapping
     in FPGA

Versatility

 12. Virtual Platform API -- See hybrid examples above, users
     like AMD use it
 13. Transactor Availability -- to connect simulation (TLM and
     RTL) to emulation; Palladium has a large portfolio
 14. System Connections -- available system connections,
     i.e. how well can one connect to an Ethernet/USB/PCI.
     Palladium has a large portfolio of rate adaptors for
     that (Speedbridges)
 15. Verification Language and Native Support
 16. Replication cost -- for software developers and regressions
     there is a "pain limit" of price due to the larger number
     of users.  This is a main differentiation between FPGA-based
     prototypes and Emulation, and that's why Cadence has both.
     See my previous post in ESNUG 517 #6
 17. Low Power -- the ability to determine low power early -- like
     with Palladium Dynamic Low Power analysis.
 18. Gate-level Acceleration - something FPGA-based emulators
     cannot support due to the explosion in complexity in re-mapping
     the target technology to FPGA gates

Cost (Jim added this!)

 19. Price per gate; 2-5 cents for Palladium, 0.25 to 2 cents FPGA

         ----    ----    ----    ----    ----    ----   ----

THE FOUR ENGINES

The metrics above really compare four basic engines: Emulation, Virtual
Prototyping, RTL Simulation and FPGA-Based Prototyping.  And the engine
applicability depends on the scope of what is verified, i.e. just hardware,
hardware with software, sub-systems, SoCs etc, as well as the time in
the project flow.

The different engine sweet spots as an overlay on the main user tasks:

    Fig 5: The Four Basic Engine Sweet Spots and Scopes
           (CLICK ON PIC TO ENLARGE IMAGE.)

Depending on whether models are available, virtual prototyping can enable
software development as early as a couple of weeks after the spec is
available. It is fast, allows good software debug insight and execution
control and is typically the quickest way to bring up SW on a new design. 

By itself, it does not allow detailed hardware debug, which is the initial
strength of RTL simulation.  Used initially for RTL development, IP
integration and design verification, RTL simulation can extend to the
complexity of sub-systems and certainly is a sign-off criterion for
gate-level simulation and timing sign-off.  It allows the fastest
turnaround time for new RTL, offers excellent hardware debug but is
typically too slow to execute meaningful amounts of software. 

To better extend to sub-systems and the full SoC, verification acceleration
moves the DUT into hardware and can allow enough speedup for bare-metal
software development. With its in-circuit capabilities, emulation extends
the verification to the full chip and chip-in-system level by enabling
connections to real system environments like PCI, USB and Ethernet.

As discussed, the main advantage of processor-based emulation is fast
turnaround time for bring-up, which makes it ideal for the project phase
in which RTL is not quite yet mature.  In addition it allows multi-user
access and excellent hardware debug insight in the context of real software
that can be executed at MHz speeds, resulting in very efficient HW/SW debug
cycles.  Standard software debuggers can be attached using JTAG adaptors
or virtual connections. 

         ----    ----    ----    ----    ----    ----   ----

HOW FPGA-BASED DIFFERS

In contrast, FPGA-based emulators are typically weaker with respect to debug
efficiency and turnaround time, making them less reactive and really -- like
FPGA-based prototypes -- more applicable for later project stages in which
RTL has become more mature.

FPGA-based prototyping allows speed ranges into the 10's of MHz range and
often offers the best cost-per-gate per MHz for SW development and hardware
regressions in the project phase when RTL has become stable enough so that
fast turnaround time and hardware debug matter less.

The downside to standard FPGA-based prototyping is capacity limitations as
well as longer bring-up due to the changes that have to be made to map the
RTL to FPGAs.

So only the efficient combination of the four engines provides a complete
solution.  Emulation is a key part of it.  But don't believe me?  Here's a
screenshot from sTec's DAC presentation, slide 5 [Ref 10]:

    Fig 6: Example of how multiple platforms are used in conjunction
           (CLICK ON PIC TO ENLARGE IMAGE.)

         ----    ----    ----    ----    ----    ----   ----

JIM'S "CORRECTED" TABLE

After discussing the 19 metrics/gotchas with Jim -- who had summarized
everything in his table in ESNUG 522 #4 -- I've added my changes:

	Cadence Palladium	Mentor Veloce 2	Synopsys EVE Zebu	Synopsys HAPS, Cadence RPP, DINI, Aldec, S2C, HOENS, Hitech Global, ProDesign
Emulator Architecture	custom silicon, processor based architecture, custom board, custom box, scalable memory architecture eliminates need for backplanes or cross bars	custom silicon, FPGA based architecture, custom board, custom box, switching backplane and virtual-wires	off-the-shelf FPGA, custom board, custom box, cross bar and TDM	off-the-shelf FPGA, off-the-shelf board, off the shelf box (2 M - 100+ M for Virtex-7 based systems). HAPS and RPP have automated software flows.
Granularity	4 M to 2 B	16 M to 2 B	25 M - 200 M (per Jim Hogan)	4 M to 100 M+
Price/gate (per Jim Hogan)	2-5 cents (per Jim Hogan)	2-5 cents (per Jim Hogan)	0.5 - 2 cents (per Jim Hogan)	0.25 -1 cent (per Jim Hogan)
Dedicated Support	yes	yes	mixed	no
Design Capacity	Claims up to 2 billion. Typical usage 100 M to 1 B gates.	Claims up to 2 billion. Typical usage 100 M to 1 B gates.	Claims up to 2 billion. Typical usage 100 M to 1 B gates.	Claims up to 100+ million. Typical usage 2 M to 50 M gates.
Typical Utilization	90%-100%	60% to 75%	60% to 75%	Depends on partitioning
Primary Target Designs	SoCs 100 M to 1 B gates. Large CPUs, GPUs, multi-chip systems, application processors. Due to its granularity, Palladium extends into IP and subsystems just fine.	SoCs 100 M to 1 B gates. Large CPUs, GPUs, multi-chip systems, application processors.	SoCs from 25 M to 200 M gates	IP blocks, sub-system, and SoCs from 2 M to 100 M
Speed range (cycles/sec)	100 K to 2 M Scaling well with design size	100 K to 1.5 M Degrading with design size	500 K to 5M Degrading with design size and probes	2 M to 20 M Degrading with design size and probes
Compile time	35 M gates/hour. Single workstation (Palladium). Includes automated partitioning time, no need to parallelize	40 M gates / hour with PC farm, Includes automated partitioning time. Parallelizable: Yes	25 M - 100 M gates/hr for PC farm. Proprietary software for fast FPGA partitioning, synthesis and P&R. Parallelizable: Yes	1 M - 15 M gates/hr for Roll Your Own and PC farm. RPP and HAPS have automated flows, 25 M -100 M Custom constrained by FPGA vendor synthesis and P&R times. Doesn't include partitioning time. Parallelizable: Yes
Partitioning	Automated	Automated, but see Rent's Rule	Automated, but see Rent's Rule	semi-automated for most RPP and HAPS have automated flows Partitioning depends on # of FPGAs. Time range 30 min to 4 hours.
Visibility	full visibility. at-speed probe capture. buffer for 1M cycles.	full visibility. at-speed probe capture. Cause slow down in execution speed. Smaller buffer	static, dynamic probes. at-speed probe capture. Cause slow down in execution speed.	static, dynamic probes (vendor dependent). at-speed probe capture (vendor dependent). Cause slow down in execution speed,
Debug	Breakpoints, assertions, unique simulation hot-swap, SW debug. Clear debug advantages for processor based here as outlined by Jim Hogan as well	Breakpoints, some assertions, SW debug.	Breakpoints, some assertions, SW debug.	Breakpoints, little assertions, SW debug.
Virtual platform API	Yes	Yes	Yes	varies by vendor
Transactor Availability	Standard/off-the-shelf: Good. Custom: developed ad hoc	Standard/off-the-shelf: Good. Custom: developed ad hoc	Standard/off-the-shelf: Good. Custom: developed ad hoc	Standard/off-the-shelf: Mixed. Custom: developed ad hoc
Verification Language - Native support	C++, SystemC, Specman e, SystemVerilog, OVM, SVA, PSL, OVL	C++, SystemC, Specman e, SystemVerilog, OVM, SVA, PSL, OVL	Synthesizable Verilog, VHDL, System Verilog	Synthesizable Verilog, VHDL, System Verilog
Memory	up to 1 TB	???	up to 200 GB	up to 32 GB
Users	1 to 512 users	1 to 128	1 to 49	1 user

    Table 3: Jim Hogan's table on the metrics re-drawn (after we
             had lunch and a few drinks over it.  ;)

To summarize what changed:

  - Veloce 2 really belongs into the FPGA-based emulator category,
    i.e. the middle column, split from Palladium.

  - Visibility and debug are not the same between FPGA-based and
    processor-based offerings.

  - There is no "up to 1 TB memory" in Veloce (memories are mapped
    to the FPGA). 

  - number of users is limited to 128 for Veloce 2 BG vs. 512 for
    Palladium.

And FPGA-based prototyping column was missing Synopsys HAPS, DINI, etc.

    - Frank Schirrmeister
      Cadence Design Systems, Inc.               San Jose, CA

         ----    ----    ----    ----    ----    ----   ----

Related Articles

  CDNS says Hogan missed granularity, user access, speed, capacity
  CDNS says Hogan missed FPGA compile time, Rent's Rule, probing
  CDNS says Hogan missed close to 10 emulation customer use models
  CDNS says Hogan missed 47 Palladium user papers on Cadence.com

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)