( DAC'20 Item 05a ) ----------------------------------------------- [10/21/21]
Subject: CDNS Palladium Z1 speed, uptime, & cloud access is Best of 2020 #5a
THE EMULATION JUMBO JET: There are four basic reasons why engineers like
about using Palladiums -- and why the Z1 beat out MENT & SNPS HW boxes in
2020. A good analogy explaining this is flying overseas on a jumbo jet.
- 1. Fast compile time. Building the design database.
"6:50 PM, Tues. Sept 14, Virgin Atlantic is flying 252 people
from New York (JFK) to London Heathrow Airport (LHR)"
"The Z1 takes 5-6 hours to compile with CDNS' parallel compile."
- 2. Allocation. Placing the database inside the available resources.
"Flight USX38 has a complete seat assignment of all 252 passengers.
All families and related groups are successfully seated together."
"Palladium's biggest strength is fine granularity in sizing the
design to various footprints at compile time -- plus flexible
dynamic placement at runtime."
- 3. Runtime. How fast your design database run in different modes.
(ICE, simulation acceleration, CAKE, ...)
"How long does this flight from New York City to London take?
Any stops in Iceland/Ireland/France along the way?"
"We've actively run Palladium Z1 on our full SSD SoC designs
for 3 years now. Got speeds close to 2 MHz."
- 4. Debug. How easy is quick detailed visibilty into your bugs?
"There is a severe storm in the North Atlantic. The pilot needs
immediate access to dynamically changing (and potentially
plane-crashing) weather and GPS data to help him reroute."
"The speed at which it can gather waves is amazing... Z1 only
takes 60 seconds or so to get the waveforms for a 6-board
design. Super fast! The rivals can't get near that!"
---- ---- ---- ---- ---- ---- ----
ALSO, UPTIME & PREDICTABILTY BEAT OUT FPGA: Also one 2020 user commented on
Palladium Z1's ability to get back up & online fast -- compared against his
lesser earlier experiences with Mentor Veloce.
"Palladium's uptime was good, even during eval. And when we did have
problems (even after eval), Cadence fixed them fast. The CDNS FAEs
would just swap a system board and it was back up running right away.
My experience was that Veloce had a lot of downtime due to failures.
Veloce was also prone to longer downtimes, and Mentor's process of
debugging Veloce problems was complex."
And another 2020 user liked that CDNS HW would predictability run once it
compiled; compared with some of the FPGA-based boxes (Veloce/ZeBu/HAPS).
"The most appealing Palladium advantage over its [MENT/SNPS] rivals
is that "if it compiles, it will run." -- can't say that with
those other FPGA-based guys!
Palladium mostly eliminates HW platform-induced gotchas from our
debug round trips -- which is a high value to us when our design
churn is high."
---- ---- ---- ---- ---- ---- ----
FCLK, CAKE, & CLOUD: Three new techie topics that users mentioned this time
around in the survey (oddly all Palladium related this year) were:
Cake Modes -- 25% core clock speed up relative to FCLK
CAKE-2's design speed is half of the Palladium clock speed. It samples
at the edge of the fastest clock. CAKE-1 is faster; it is the same
as the main clock speed. It samples both the pos and neg edges.
---- ---- ---- ---- ---- ---- ----
You might want to check out the Palladium Cloud that CDNS is selling.
We're a small Tier 3, so overall cost is our #1 concern for emulation.
---- ---- ---- ---- ---- ---- ----
AIR VS. WATER: But one advantage three MENT Veloce users cited vs. Cadence
Palladium Z1's was Veloce's are air cooled vs. Palladium's are water cooled.
Mgmt likes that Veloces are air cooled.
I know that saves on install costs -- because no plumbing is required
for the Veloce's -- but I don't know if Palladiums use much less kWatts per
year cause they're water cooled. (See ESNUG 567 #3, 574 #3, 532 #13)
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
QUESTION ASKED:
Q: "What were the 3 or 4 most INTERESTING specific EDA tools
you've seen in 2020? WHY did they interest you?"
---- ---- ---- ---- ---- ---- ----
Since most of my life revolves Z1's these days, I should probably
nominate it as my most interesting EDA tool of 2020.
---- ---- ---- ---- ---- ---- ----
The Palladium Z1 is our workhorse around here.
---- ---- ---- ---- ---- ---- ----
Cadence Z1
---- ---- ---- ---- ---- ---- ----
Cadence Palladium Z1
Our group has used Cadence Palladium for over 15 years.
For a non-FPGA box, it has blazing runtime performance; plus we like
its fast turnaround during our compile-and-debug iterations.
We use Palladium for hardware/software co-verification, and virtual
emulation -- but ICE is our most common use model.
For us Z1's strengths are compile time and time-to-visibility
- They are by far better than the competition (that we can't
name, John.) We have more than a dozen Palladium emulator
racks and have designs that scale up to 12 racks (Z1). The
Z1 takes 5-6 hours to compile with Cadence's parallel compile.
We use the Z1 for block-level verification as well as for verifying
our entire design. Depending on our design size, we get speeds from
250 kHz to 1 MHz. Our experience is that Palladium's post-compile
runtime is it's reliable and matches the semantics of simulation.
We use Cadence's physical and virtual bridges to connect to our design.
Both have their use cases and advantages. A physical bridge provides
easier bring up for standard interfaces. A virtual bridge gives an
edge on custom or newer interfaces -- it also provides more leverage
and flexibility in setting up our lab infrastructure.
We love Z1's debug-time-to-visibility.
- The speed at which it can gather waves is amazing -- it's a
distinct advantage over the competition (that I can't name.)
- Z1 only takes 60 seconds or so to get the waveforms for a
6-board design. Super fast! The rivals can't get near that!
Our lab footprint is tremendous, so it's important for us to use
Palladium efficiently. We have multiple engineers using the Z1 all
the time without any issues.
I'd strongly recommend Palladium Z1 over its rivals. It's not perfect,
but it's much better than what's on the market right now. The most
important features for any emulation tool are debug visibility and
fast compile times. Palladium excels in both enormously. To add to
that, it's extremely reliable.
---- ---- ---- ---- ---- ---- ----
Our Palladium vs. Veloce vs. Zebu eval
We've actively run Palladium Z1 on our full SSD SoC designs for
3 years now. Got speeds close to 2 MHz.
Our primary use it for SOC debug, and firmware development. We chose
Palladium Z1 instead of Veloce based on:
- Our Palladium eval
- My prior knowledge of Veloce
- As far as Zebu was concerned, we initially gave some thought
to looking into it, but for us, Zebu came across more as
testbench acceleration than a fast compiling debug box.
So we eliminated Zebu altogether from our eval.
- Palladium is better for our In Circuit Emulation (ICE) needs.
Palladium vs. Veloce Uptime/Downtime --
- Palladium's uptime was good, even during eval. And when we
did have problems (even after eval), Cadence fixed them fast.
The CDNS FAEs would just swap a system board and it was back
up running right away.
- My experience was that Veloce had a lot of downtime due to
failures. Veloce was also prone to longer downtimes, and
Mentor's process of debugging Veloce problems was complex.
Initial set up for Palladium --
Our first design on Palladium went from scratch to fully deployed in
only 3 months, including the hardware shipping, set up, mapping,
compiling, testing, and delivering to our ASIC and firmware teams.
A completely new design now takes us only 3 weeks to start using
Palladium Z1. And it's only overnight if we have any RTL change.
Speeding up Palladium --
Tricks to speed-up Palladium's out-of-the-box runtime. We want
our core functional clock as close as possible to Palladium FCLK
(the fastest operational clock)
For easy math, let's assume we start with a core clock at 1 MHz
out-of-the-box speed. The numbers below each add directly to
the 1 MHz. (i.e., they don't compound.)
1. Multi-compile -- 20% core clock speed up relative to FCLK
Palladium has a multi-compile option, where we can have it
automatically run multiple compile iterations, and the pick
the optimal one. In our case:
- One iteration compile took 3 hours.
- Then a 30-iteration compile took 12 hours.
- We also tried a 40-iteration compile, but the improvement
over 30 was negligible. And 12 hours is a sweet spot for
us, as it is an overnight run.
Multi-compiles got us a 20% bump in the core clock speed.
Note: This technique is most effective after your RTL is stable.
2. Cake Modes -- 25% core clock speed up relative to FCLK
CAKE-2's design speed is half of the emulator clock speed.
It samples at the edge of the fastest clock.
CAKE-1 is faster; it is the same as the main clock speed.
It samples both the positive and negative edges.
We always emulate with CAKE-2 first and check the functionality. This
is because all clocks are positive edge triggered and run the way we
expect in silicon. We then recompile in CAKE-1 for higher performance.
Doing Cake Modes gave us 25% more performance.
3. Shadow Net Optimization -- 10% core clock speed up relative to FCLK
When switching from CAKE-2 (positive edge processing) to CAKE-1
(positive and negative edge processing), Palladium needs to add
a value for the signal it's interpreting.
This adds nets -- which increased our utilization -- our initial
design was taking 2x the capacity from shadow nets and they also
negatively impacted performance.
So, we do must do shadow net optimization next. 80% of the
shadow nets from CAKE-1 were not real, but rather the compiler's
interpretation of the code. The compiler always conservatively
assumes that both (positive and negative) edges of a signal are
used, and we just tell the tool that it does not need to compute
on some negative edges.
We run Palladium's shadow net optimization script and Palladium
automatically tweaks everything -- removing the unwanted shadows
and using a different placement.
The result was another 10% speed-up, so making the total for
CAKE-1 plus Shadow Net Optimization is 35%.
All the three of these methods combined took got us 55% improvement.
That means our earlier 1 MHz out-of-the-box speed jumped to 1.55 MHz.
We also used other methods, such as clock divider bypass and removing
DFT logic, for further improvement, and ultimately got approximately 75%
core clock improvement over FCLK. In other words, our core clock is now
75% closer to FCLK when we finished all the improvements. So, when FCLK
is 2 MHz our core clock now runs at 1.55MHz.
It took us 3 months of total effort to get there. We were able to do it
in parallel with our other emulation activity.
As for our end-to-end testing, i.e., from the host PC to the hard drive,
we improve our IO Operations (IOPS) by 9x.
Palladium Compile --
Our designs fit into one Palladium Z1 cluster -- each Z1 has 2 to 3
clusters. Cadence claims 2.3B gates in single Z1 system.
We compile our mature designs only when we have significant change to
our design and want a high performing database.
Our compile process has three 3 steps: synthesis, import, and compile.
It takes under 3 hours. Palladium runs well after we compile -- as long
as we have sufficient capacity to download what we have compiled.
In contrast, with FPGA-based systems, just because it compiles, it does
NOT mean it will meet timing. (However, my experience with Veloce is
it always generate functional results if it successfully compiles and
capacity is available.)
Conclusion --
Our team has used Palladium for architecture analysis all the way to
post-silicon validation. We've also experimented with it for power
verification.
Up to 8 of our engineers will use it at the same time -- this works
well for us, as we've built a cloud-style emulation layer over it.
We really like Palladium's ease-of-use and debug. It takes only 5 to 30
minutes to get waveforms. Also, it's highly reliable; we go many months
at a stretch with no problems. Some of our validation regressions
run for several months without any interruptions.
Cadence's support is also great. When I say support, I'm referring to
both the people and the process. I'm impressed with the short time it
takes for Cadence to run diagnostics to find bad board, replace it, and
check that everything is working.
---- ---- ---- ---- ---- ---- ----
We've used the Z1 emulator for over a year. It's our vendor-hosted VCAD
(Virtual Integrated Computer-Assisted Design) environment.
We get speeds of 1 to 2 MHz in 1xua/CAKE1 with Palladium Z1, depending
on the model and utilization. We have up to 6 engineers running
concurrent/simultaneous emulation jobs.
Palladium's biggest advantages over its rivals are:
1. Fine granularity in sizing the design to various footprints at
compile time -- plus flexible dynamic placement at runtime.
2. Fast compile times and a highly tunable compile flow that
allows for efficient use of compute.
3. Strong performance for simulation acceleration and TBA
(Transaction-Based Acceleration) co-modeling features.
4. The most appealing Palladium advantage over its rivals is
that "if it compiles, it will run." -- can't say that with
the other FPGA-based guys!
Palladium mostly eliminates HW platform-induced gotchas from
our debug round trips -- which is a high value to us when our
design churn is high.
We definitely don't miss the FPGA partitioning, place and route, and
timing issues that FPGA-based emulation has.
"Palladium Cloud"
With its hosted VCAD offering (Virtual Integrated Computer-Assisted
Design, aka the Palladium Cloud), Cadence has a very attractive
solution for smaller companies like us to get going fast with
emulation -- without having to incur the upfront/NRE costs of
building out an emulation lab/datacenter.
We mostly use Palladium Z1 for system-level software workloads running
on virtual hybrid system models. Our use cases are: HW verification,
SW verification, HW/SW co-verification, architecture analysis,
verification acceleration, and virtual emulation.
Our capacity needs vary -- we target multiple design elaboration
configurations ranging from few domains to few/several boards.
- We see compilation speeds at approximately 1 board per
hour, for 10 sequential compile trials. This speed number is
for clean RTL builds -- we do not do incremental compiles.
- We use parallel synthesis, but currently do not parallelize
the compile trials for various reasons.
Palladium Debug
Overall, Palladium's debug functionality is good/excellent on the
front-end features and the language side. It is ok/good on usability,
and ok/acceptable on performance. FullVision is great in absence of
capacity concerns -- it lets you do what the name suggests, have full
RTL visibility compiled into the database without the need of
compile-time probing.
Needs Improvement: FullVision's source-level debug annotation is weak
compared to flagship debug environments. The waveform rendering requires
significant compute for parallel processing on larger designs; SimVision
is very basic -- it'd be nice if Cadence included Indago in the base
package, too.
Cadence's VCAD with Palladium Z1 and Cloud is our overall best option.
---- ---- ---- ---- ---- ---- ----
You might want to check out the Palladium Cloud that CDNS is selling.
We're a small Tier 3, so overall cost is our #1 concern for emulation.
---- ---- ---- ---- ---- ---- ----
The Z1 is our mainstay.
---- ---- ---- ---- ---- ---- ----
Pallium
---- ---- ---- ---- ---- ---- ----
That new Pallium Cloud looks interesting.
---- ---- ---- ---- ---- ---- ----
Mgtmt want us to price out the CDNS emulation cloud stuff.
---- ---- ---- ---- ---- ---- ----
Boss man wants a Veloce Cloud vs. Palladium Cloud eval.
Do you have one, John?
---- ---- ---- ---- ---- ---- ----
Anirudh's Z1
---- ---- ---- ---- ---- ---- ----
Palladium
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
USERS ON THE VELOCE HW BOXES
Veloce Strato
---- ---- ---- ---- ---- ---- ----
We have giant room heated by Veloce boxes at a remote site.
---- ---- ---- ---- ---- ---- ----
Air cooled Veloce's install easier than water cooled Palladiums.
---- ---- ---- ---- ---- ---- ----
Mgmt likes that Veloces are air cooled.
---- ---- ---- ---- ---- ---- ----
My MENT FAE asked me to write to you about the upside of
air cooled Veloces compared to water cooled Palladiums.
Air cooled is better.
Done.
---- ---- ---- ---- ---- ---- ----
Strato and Strato+
---- ---- ---- ---- ---- ---- ----
Mentor Veloce does it for us.
---- ---- ---- ---- ---- ---- ----
I'd mention Veloce for that since I use it daily.
---- ---- ---- ---- ---- ---- ----
VCS for first run debug
Veloce for HW acceleration/emulation
Fusion Compiler for PnR
Calibre for DRC
---- ---- ---- ---- ---- ---- ----
Veloce
---- ---- ---- ---- ---- ---- ----
Related Articles
CDNS Palladium Z1 speed, uptime, & cloud access is Best of 2020 #5a
CDNS Protium "dynamic duo" hooks into Palladium is Best of 2020 #5b
... and Big 3 vendors launched new HW in 2021 and users want scoops!
Sneak peeks at new Palladium X2 and new Protium X2 is Best 2020 #5c
Join
Index
Next->Item
|
|