( ESNUG 532 Item 4 ) -------------------------------------------- [09/06/13]
From: [ Frank Schirrmeister of Cadence ]
Subject: CDNS says Hogan missed 5 metrics/gotchas for picking an emulator
Hi, John,
In ESNUG 522 #3, Jim described 14 metrics/gotchas that he felt a design team
must consider when picking what emulator to use on their project (if any).
After I had lunch with Jim we were up to 19. Here they are, sorted along
my three marketing categories axes of productivity, predictability and
versatility:
Productivity
1. Initialization and Dedicated Support - available for emulation,
less for FPGA based prototyping
2. Capacity -- Palladium leads as per above
3. Speed Range -- Palladium leads especially for big designs for
which FPGA based emulation slows
4. Number of Users -- Palladium 4x better over Veloce
5. Primary Target Designs (i.e. granularity) -- Palladium 4x finer
over Veloce
6. Visibility -- Palladium 2.5x better trace depth and great
upload speed
7. Debug - Palladium 2.5x better trace depth for full vision
Predictability
8. Compile Time -- Predictable in Palladium vs. power hungry
server farms for Veloce and Zebu
9. Time of availability -- Fast compile make processor based
emulation applicable earlier
10. Partitioning -- Predictable in Palladium vs. messy in FPGA
based emulation
11. Memory Capacity - Predictable in Palladium vs. re-mapping
in FPGA
Versatility
12. Virtual Platform API -- See hybrid examples above, users
like AMD use it
13. Transactor Availability -- to connect simulation (TLM and
RTL) to emulation; Palladium has a large portfolio
14. System Connections -- available system connections,
i.e. how well can one connect to an Ethernet/USB/PCI.
Palladium has a large portfolio of rate adaptors for
that (Speedbridges)
15. Verification Language and Native Support
16. Replication cost -- for software developers and regressions
there is a "pain limit" of price due to the larger number
of users. This is a main differentiation between FPGA-based
prototypes and Emulation, and that's why Cadence has both.
See my previous post in ESNUG 517 #6
17. Low Power -- the ability to determine low power early -- like
with Palladium Dynamic Low Power analysis.
18. Gate-level Acceleration - something FPGA-based emulators
cannot support due to the explosion in complexity in re-mapping
the target technology to FPGA gates
Cost (Jim added this!)
19. Price per gate; 2-5 cents for Palladium, 0.25 to 2 cents FPGA
---- ---- ---- ---- ---- ---- ----
THE FOUR ENGINES
The metrics above really compare four basic engines: Emulation, Virtual
Prototyping, RTL Simulation and FPGA-Based Prototyping. And the engine
applicability depends on the scope of what is verified, i.e. just hardware,
hardware with software, sub-systems, SoCs etc, as well as the time in
the project flow.
The different engine sweet spots as an overlay on the main user tasks:
Fig 5: The Four Basic Engine Sweet Spots and Scopes
(CLICK ON PIC TO ENLARGE IMAGE.)
Depending on whether models are available, virtual prototyping can enable
software development as early as a couple of weeks after the spec is
available. It is fast, allows good software debug insight and execution
control and is typically the quickest way to bring up SW on a new design.
By itself, it does not allow detailed hardware debug, which is the initial
strength of RTL simulation. Used initially for RTL development, IP
integration and design verification, RTL simulation can extend to the
complexity of sub-systems and certainly is a sign-off criterion for
gate-level simulation and timing sign-off. It allows the fastest
turnaround time for new RTL, offers excellent hardware debug but is
typically too slow to execute meaningful amounts of software.
To better extend to sub-systems and the full SoC, verification acceleration
moves the DUT into hardware and can allow enough speedup for bare-metal
software development. With its in-circuit capabilities, emulation extends
the verification to the full chip and chip-in-system level by enabling
connections to real system environments like PCI, USB and Ethernet.
As discussed, the main advantage of processor-based emulation is fast
turnaround time for bring-up, which makes it ideal for the project phase
in which RTL is not quite yet mature. In addition it allows multi-user
access and excellent hardware debug insight in the context of real software
that can be executed at MHz speeds, resulting in very efficient HW/SW debug
cycles. Standard software debuggers can be attached using JTAG adaptors
or virtual connections.
---- ---- ---- ---- ---- ---- ----
HOW FPGA-BASED DIFFERS
In contrast, FPGA-based emulators are typically weaker with respect to debug
efficiency and turnaround time, making them less reactive and really -- like
FPGA-based prototypes -- more applicable for later project stages in which
RTL has become more mature.
FPGA-based prototyping allows speed ranges into the 10's of MHz range and
often offers the best cost-per-gate per MHz for SW development and hardware
regressions in the project phase when RTL has become stable enough so that
fast turnaround time and hardware debug matter less.
The downside to standard FPGA-based prototyping is capacity limitations as
well as longer bring-up due to the changes that have to be made to map the
RTL to FPGAs.
So only the efficient combination of the four engines provides a complete
solution. Emulation is a key part of it. But don't believe me? Here's a
screenshot from sTec's DAC presentation, slide 5 [Ref 10]:
Fig 6: Example of how multiple platforms are used in conjunction
(CLICK ON PIC TO ENLARGE IMAGE.)
---- ---- ---- ---- ---- ---- ----
JIM'S "CORRECTED" TABLE
After discussing the 19 metrics/gotchas with Jim -- who had summarized
everything in his table in ESNUG 522 #4 -- I've added my changes:
| Cadence Palladium
| Mentor Veloce 2
| Synopsys EVE Zebu
| Synopsys HAPS, Cadence RPP, DINI, Aldec, S2C, HOENS, Hitech Global, ProDesign
|
Emulator Architecture
| custom silicon, processor based architecture, custom board, custom box, scalable memory architecture eliminates need for backplanes or cross bars
| custom silicon, FPGA based architecture, custom board, custom box, switching backplane and virtual-wires
| off-the-shelf FPGA, custom board, custom box, cross bar and TDM
| off-the-shelf FPGA, off-the-shelf board, off the shelf box (2 M - 100+ M for Virtex-7 based systems). HAPS and RPP have automated software flows.
|
Granularity
| 4 M to 2 B
| 16 M to 2 B
| 25 M - 200 M (per Jim Hogan)
| 4 M to 100 M+ |
Price/gate (per Jim Hogan)
| 2-5 cents (per Jim Hogan)
| 2-5 cents (per Jim Hogan)
| 0.5 - 2 cents (per Jim Hogan)
| 0.25 -1 cent (per Jim Hogan)
|
Dedicated Support
| yes
| yes
| mixed
| no
|
Design Capacity
| Claims up to 2 billion. Typical usage 100 M to 1 B gates.
| Claims up to 2 billion. Typical usage 100 M to 1 B gates.
| Claims up to 2 billion. Typical usage 100 M to 1 B gates.
| Claims up to 100+ million. Typical usage 2 M to 50 M gates.
|
Typical Utilization
| 90%-100%
| 60% to 75%
| 60% to 75%
| Depends on partitioning
|
Primary Target Designs
| SoCs 100 M to 1 B gates. Large CPUs, GPUs, multi-chip systems, application processors. Due to its granularity, Palladium extends into IP and subsystems just fine.
| SoCs 100 M to 1 B gates. Large CPUs, GPUs, multi-chip systems, application processors.
| SoCs from 25 M to 200 M gates
| IP blocks, sub-system, and SoCs from 2 M to 100 M
|
Speed range (cycles/sec)
| 100 K to 2 M Scaling well with design size
| 100 K to 1.5 M Degrading with design size
| 500 K to 5M Degrading with design size and probes
| 2 M to 20 M Degrading with design size and probes
|
Compile time
| 35 M gates/hour. Single workstation (Palladium). Includes automated partitioning time, no need to parallelize
| 40 M gates / hour with PC farm, Includes automated partitioning time. Parallelizable: Yes
| 25 M - 100 M gates/hr for PC farm. Proprietary software for fast FPGA partitioning, synthesis and P&R. Parallelizable: Yes
| 1 M - 15 M gates/hr for Roll Your Own and PC farm. RPP and HAPS have automated flows, 25 M -100 M Custom constrained by FPGA vendor synthesis and P&R times. Doesn't include partitioning time. Parallelizable: Yes
|
Partitioning
| Automated
| Automated, but see Rent's Rule
| Automated, but see Rent's Rule
| semi-automated for most RPP and HAPS have automated flows Partitioning depends on # of FPGAs. Time range 30 min to 4 hours.
|
Visibility
| full visibility. at-speed probe capture. buffer for 1M cycles.
| full visibility. at-speed probe capture. Cause slow down in execution speed. Smaller buffer
| static, dynamic probes. at-speed probe capture. Cause slow down in execution speed.
| static, dynamic probes (vendor dependent). at-speed probe capture (vendor dependent). Cause slow down in execution speed,
|
Debug
| Breakpoints, assertions, unique simulation hot-swap, SW debug. Clear debug advantages for processor based here as outlined by Jim Hogan as well
| Breakpoints, some assertions, SW debug.
| Breakpoints, some assertions, SW debug.
| Breakpoints, little assertions, SW debug.
|
Virtual platform API
| Yes
| Yes
| Yes
| varies by vendor
|
Transactor Availability
| Standard/off-the-shelf: Good. Custom: developed ad hoc
| Standard/off-the-shelf: Good. Custom: developed ad hoc
| Standard/off-the-shelf: Good. Custom: developed ad hoc
| Standard/off-the-shelf: Mixed. Custom: developed ad hoc
|
Verification Language - Native support
| C++, SystemC, Specman e, SystemVerilog, OVM, SVA, PSL, OVL
| C++, SystemC, Specman e, SystemVerilog, OVM, SVA, PSL, OVL
| Synthesizable Verilog, VHDL, System Verilog
| Synthesizable Verilog, VHDL, System Verilog
|
Memory
| up to 1 TB
| ??? |
up to 200 GB
| up to 32 GB
|
Users
| 1 to 512 users
| 1 to 128
| 1 to 49
| 1 user
|
Table 3: Jim Hogan's table on the metrics re-drawn (after we
had lunch and a few drinks over it. ;)
To summarize what changed:
- Veloce 2 really belongs into the FPGA-based emulator category,
i.e. the middle column, split from Palladium.
- Visibility and debug are not the same between FPGA-based and
processor-based offerings.
- There is no "up to 1 TB memory" in Veloce (memories are mapped
to the FPGA).
- number of users is limited to 128 for Veloce 2 BG vs. 512 for
Palladium.
And FPGA-based prototyping column was missing Synopsys HAPS, DINI, etc.
- Frank Schirrmeister
Cadence Design Systems, Inc. San Jose, CA
---- ---- ---- ---- ---- ---- ----
Related Articles
CDNS says Hogan missed granularity, user access, speed, capacity
CDNS says Hogan missed FPGA compile time, Rent's Rule, probing
CDNS says Hogan missed close to 10 emulation customer use models
CDNS says Hogan missed 47 Palladium user papers on Cadence.com
Join
Index
Next->Item
|
|