( ESNUG 553 Item 3 ) -------------------------------------------- [11/13/15]

Subject: Pt 3 - Lauro missed energy costs is intrinsic power use over time

... CONTINUED FROM Pt. 2 ...

         ----    ----    ----    ----    ----    ----    ----

LAURO MISSED THAT ENERGY COSTS IS INTRINSIC POWER USE OVER TIME

For emulation power use, Lauro's table in ESNUG 547 #9 was as follows:
  Cadence
Palladium-XP2 (GXL)
Mentor
Veloce 2
Synopsys EVE
Zebu Server 3
Single Cabinet
Capacity
72 million
ASIC-gates
1 billion
ASIC-gates
300 million
ASIC-gates
Compilation
Speed
~70 MG/hour
[single PC]
~40 MG/hour
[server farms]
~5 MG/hour
[server farms]
Design Visibility
w/o Compilation
full visibility
at high-speed
full visibility
at high-speed
full visibility
at low-speed
Single Cabinet
Power Use
unpublished ~44.0 kW ~4.0 kW
Cooling System water cooled forced air forced air

For starters, this is the cabinet game again and needs to be normalized to
proper capacities.  In ESNUG 0532-06, Mentor claims Palladium XP consumes
3.5x the power of Veloce 2.  Using the way the table was organized, the
"Single Cabinet Power Use" would be:

                    0.072 x 44KW x 3.5 = 11.1 KW

I'll give you three guesses whether Mentor's estimate of 3.5x power was in
Cadence favor or not.  But you only need one.  Of course, Palladium power
consumption is less than that per cabinet...

               Error Count:  1    Total Error Count:  4
                Miss Count:  0     Total Miss Count: 15

Then, as Jim Hogan said in his most recent post:

However, intrinsic power consumption is only one part of the power equation. The reality is the TOTAL processor-based emulator power us can actually be LOWER than the same for an FPGA-based emulator when you add up ALL the power used in a verification task.

    - Lauro Rizzatti, emulation consultant       http://www.deepchip.com/items/0547-09.html

Hogan is right.  When looking at power holistically, intrinsic power really
is only one contributor.  Due to the execution speed difference, reduced
server requirements for compilation, and different depth for the buffers
that hold the debug data, Palladium XP may actually often consumes less
actual power per verification job.

Let me break out the whole power use budget.

To debug the design, one needs to compile it, run it and collect debug data
about it.

Compile: The design compilation for an FPGA based emulator like a Zebu 3
typically takes up an entire server farm, partly driven by the FPGA's having
to be mapped.  For instance:

  - Google says average workstation power consumption is about 0.15 KW
  - assume 20 workstations for a farm
  - a 300MG design according to Lauro's table's compile times would
    take about:

         0.64 KWh for Palladium (4.3 hr compile x 0.15 KW)
        22.50 KWh for Veloce (7.5 hr compile on 20 workstations x 0.15 KW)
       180.00 KWh for Zebu (60 hr compile on 20 workstations x 0.15 KWh)

So by the time the design is compiled and actually in the emulator, looking
at power use holistically, Palladium XP is already ahead.

Run: With a Processor-Based emulator, debug is non-intrusive and slowdown is
minor.  In contrast, FPGA-based emulation is an intrusive debug, especially
for the Zebu Server 3 due having to add dynamic probes into the RTL, which
slows down execution by up to 300X (see ESNUG 532-02, Fig 1).   Assume one
needs to run:

  - 300MG design 
  - 3.6B cycles, which is one hour at one MHz speed
  - ESNUG 547-09's (Lauro's) table's power consumptions needs to be
    normalized to 300MG, so Veloce's 44 KWh for a 1BG cabinet become
    13.2 KWh (running 3.6B cycles in an hour)
  - According to ESNUG 532-06, Veloce claims 1/3.5 Palladium's power
    consumption.  Palladium would use 46.2 KWh for running 3.6B cycles
    for the 300MG design in an hour
  - ESNUG 547-09's (Lauro's) table assumes the same speed for Veloce
    and Palladium, with 2.5x the speed for Zebu.  Veloce and Palladium
    running at 1MHz, puts Zebu at 2.5MHz but then slows it down by
    300x for the probes to 8.3 KHz.  As a result, Zebu is now running
    the 3.6B cycles in 120 hours, using 480KWh assuming the 4.0KW from
    ESNUG 547-09.

Collect Debug Data: The third aspect to consider is how much power is used
when debug data is collected.  Veloce and Palladium can collect all signals
for a certain time period.  ESNUG 547-09 quotes 500K cycles for a Veloce 2
(it is really 450K cycles), while a Palladium XP II can collect 2M cycles.
To collect the same amount of debug data, Veloce 2 Quattro would have to
run 4 times.

Put this all together in a more truthful table about power use:
  Cadence
Palladium-XP2 (GXL)
Mentor
Veloce 2
Synopsys EVE
Zebu Server 3
Build a 300MG design 0.64 KWh 22.60 KWh 180.00 KWh
Run 1 hour at MHz speed (3.6B cycles) with debug 42.2 KWh 13.2 KWh 480.0 KWh
(due to 300X slower run)
Download debug data FullVision
2M Cycles
Full visibility
2M Cycles
must run 4X longer to get same depth as Palladium XP
add 3 x 13.2 KWh = 39.6 KWh
Already compensated for in slower run
Total Power 42.8 KWh 75.3 KWh
(1.8x Palladium)
660.0 KWh
(15.5 x Palladium)
As Jim Hogan concludes, overall the discussion is a somewhat moot point from
a commercial perspective with power being a small percentage of the overall
cost of an emulator.  However, when looking at emulator power holistically,
the misleading impression Lauro's table gave readers was really wrong.

               Error Count:  1    Total Error Count:  5
                Miss Count:  0     Total Miss Count: 15

         ----    ----    ----    ----    ----    ----    ----

LAURO ERRS ON PALLADIUM UPF AND DYNAMIC POWER ANALYSIS

Palladium-XP2 supports CPF (but not UPF) power analysis and it generates the switching activity for your power estimation tools.

    - Lauro Rizzatti, emulation consultant       http://www.deepchip.com/items/0547-09.html

Cadence announce UPF support for Palladium in 2014.  Palladium is the only
vendor that supports both UPF and CPF both in simulation and emulation.

               Error Count:  1    Total Error Count:  6
                Miss Count:  0     Total Miss Count: 15

From Lauro's ESNUG 547-09 table:

  Cadence
Palladium-XP2 (GXL)
Mentor
Veloce 2
Synopsys EVE
Zebu Server 3
Low-Power
Analysis
CPF UPF UPF
Power
Estimation
SAIF & FSDB SAIF & FSDB SAIF & FSDB
Veloce 2 supports the SAIF format which helps them get an accurate average
power analysis.  However, you have to first generate .vcd files through the
hardware and then use additional scripts to convert this to SAIF format to
feed into power tools.  This is very time consuming and inefficient.

In comparison, the Palladium low power engine directly generates SAIF files
through its Dynamic Power Analysis (DPA) process.  This is also much faster
due to the use of the FullVision engine.  It provides much deeper traces,
quicker SAIF file generation, and higher upload speeds.

               Error Count:  1    Total Error Count:  7
                Miss Count:  0     Total Miss Count: 15

In addition, Cadence has a patented WTC (Weighted Toggle Count) format which
can generate a detailed and accurate power profile for the entire design in
a single run.

In contrast, to have the same level of accuracy, a Veloce Quattro would need
to have multiple partitioning of the window and generate several SAIF files
which will then need to be fed into the power calculator for peak power
analysis.  Veloce2 Quattro is slower by orders of magnitude when compared to
Cadence Palladium DPA. 

               Error Count:  0    Total Error Count:  7
                Miss Count:  1     Total Miss Count: 16

         ----    ----    ----    ----    ----    ----    ----

CONTINUED IN Pt 4 BELOW

         ----    ----    ----    ----    ----    ----    ----

  Pt 1 - Lauro missed Veloce2 and Zebu have lame gate ultilization
  Pt 2 - Lauro missed Palladium job throughput is 3X faster vs. Zebu
  Pt 3 - Lauro missed energy costs is intrinsic power use over time
  Pt 4 - Lauro errs on channel latency, sim acceleration, and ICE

         ----    ----    ----    ----    ----    ----    ----

RELATED ARTICLES

  Hogan follows up on emulation user replies plus market share data
  Hogan warns Lauro missed emulation's TOTAL power use footprint
  The 14 metrics - plus their gotchas - used to select an emulator

Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.












Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2025 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)