( DAC'19 Item 1a ) ------------------------------------------------ [12/19/19]
Subject: CDNS Protium crazy fast "Palladium-compiles" #1a for Best of 2019
FAST COMPILES ROCK!: My quick-and-dirty summary of the emulator/prototyper
world. Say you have two designs to simulate. One design is 200 million
gates, the other design is 1 Billion gates.
|
|
Initial Ramp Up Time /
Incremental Compile Time
|
Operating Speed
|
Palladium
200 M gates
1.0 B gates
|
initial ramp 2-4 weeks
1.0 hour
5.0 hours
|
1.2 Mhz
800 Khz
|
Zebu Server 4
200 M gates
1.0 B gates
|
initial ramp 4-6 weeks
25.8 hours (1.1 days)
41.2 hours (1.7 days)
|
2.0 Mhz
750 Khz
|
HAPS-80
200 M gates
1.0 B gates
|
initial ramp 2-3 months
93.6 hours (3.9 days)
146.4 hours (6.1 days)
|
20.0 Mhz
5.0 Mhz
|
Veloce Strato
200 M gates
1.0 B gates
|
initial ramp 3-5 weeks
5.1 hours
12.5 hours
|
1.6 Mhz
750 Khz
|
Protium
200 M gates
1.0 B gates
|
ramp 24 hours w/Palladium
ramp 4-6 weeks w/o Palladium
28.8 hours (1.2 days)
50.4 hours (2.1 days)
|
8.3 Mhz
4.5 Mhz
|
Notice after the painful initial ramp compile (of weeks or months) the later
*incremental* compile times are *much* faster at hours or days. The two
extremes are a custom processor Palladium that initial compiles in 4 weeks
and incremental compiles in 5 hours -- but gets SW speeds of 800 Khz; and
the HAPS-80 with Xilinx FPGA's that initial compiles in 3 months and later
incremental compiles in 4 to 6 days -- but gets SW speeds of 20.0 Mhz.
HOW PROTIUM CHEATS: The one outlier in this table is Protium. It has two
different "initial ramp compile times" If you take your RTL straight into
Protium, that first initial ramp is 4-6 weeks. But if you port a Palladium
design into a Protium, your initial ramp is only 24 hours. This is how
FPGA-based Protium cheats!
The Protium users also gushed a lot about the fact that they could go back
to Palladium for fast debug and waves if needed.
"We run our design on Protium then go back to Palladium for debug.
With Palladium, we can capture waves up and down the hierarchy of
every net in our chip. It's a really big advantage of Protium."
"With Protium, we get the speed of an FPGA-based system, with the
fast ramp-up, debug, and signal traces of Palladiun. It only
takes seconds for us to see all the waveforms."
In addition, Protium took 1.2 to 2.1 days to recompile vs. Synopsys HAPS
taking 3.9 to 6.1 days to recompile.
And that's why Protium (actually the crazy fast incremental Protium compiles
with FPGA 8.3 Mhz simulation speeds) wins the #1 Best of EDA in 2019 award
from the end users this year.
(And it doesn't hurt that Protiums are 1/3rd the price of a Palladium, too.)
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
QUESTION ASKED:
Q: "What were the 3 or 4 most INTERESTING specific EDA tools
you've seen this year? WHY did they interest you?"
---- ---- ---- ---- ---- ---- ----
This year we choose Cadence Protium S1 to be the #1 Best of 2019.
Protium is CDNS's FPGA-based rapid prototyper. It rivals SNPS Zebu.
Capacity: Zebu maxes at ~3 B gates, while Protium at 10 B gates.
Speed: Zebu maxes at ~2.5 Mhz while Protium gets 5 to 16 MHz.
In comparison CDNS Palladium is 10 B gates as around 1 MHz.
Why we chose Protium as Best of 2019 is because Cadence did a great
job of automatically porting Palladium's build flow into Protium.
Their much faster *combined* compile time sold us on the CDNS pair.
This is our process:
1. We compile and run our initial testing on Palladium.
2. We port our Palladium build to Protium. The Protium tool
chain and compile process incorporates the design netlist
used for Palladium compiles. Based on previous builds on
Palladium, it has taken about 1 or 2 weeks to first port
of the design working on Protium. (Our designs are huge!)
In comparison, bring up in a SNPS HAPS-80 FPGA-based system
could take 6 weeks to 3 months for a design our size. I do
not have a precise comparison. It's what we've heard.
3. We start a different verification fork with Protium. Protium
S1 runs about 8X faster than Palladium. We can also better
utilize the chip's external interfaces like DDR memory, UART,
QSPI, etc... by connecting actual devices to them via
expansion boards.
4. If we find a bug in Protium, we can then go back to Palladium
for debug, because the representations are similar.
Debug is much better on Palladium, in part because it can
capture a longer period time of signal activity. Palladium
has more probing capabilities for signals in the design
hierarchy. It has more trace memory as well. (All stuff
that Zebu and HAPS are weak in.)
That is not say that Palladium is better than Protium for
debugging. It just depends on what is problem is being
traced and isolated.
5. We fix the bug and continue our testing on Protium.
This makes the Protium + Palladium combination our preferred platform
for both our software and hardware system developers.
---- ---- ---- ---- ---- ---- ----
Cadence Protium is a Xilinx FPGA-based hardware and software tool.
We've had it for 2.5 years now and have 5 boxes of Protium-G1 based
on 1 UltraScale VU440 FPGA.
We use Protium to compile a prototype of our ASIC device, for software
development, and to run system tests (HW & SW) for verifying our ASIC
logic prior to tapeout.
Before using Protium, we used a Dini-based FPGA system for prototyping
-- which did not have all the SW & HW infrastructure that Cadence
provides.
Our primary considerations for moving from Dini to Protium were:
- Our Dini system had a very long bring up cycle back then,
and we were looking for a way to do it faster.
- The fact that Cadence provides a PCIe SpeedBridge was a
big factor.
- Last, but not least, another big consideration was that
Protium has a very similar development flow as Palladium.
First off, Cadence PCIe SpeedBridge really works great. And, unlike
with Dini, with Protium we could use the same PCIe-controller that we
use in the ASIC w/o any modification.
We have 5 Protiums, each with one UltraScale VU440 FPGAs. The gate
count capacity more-or-less works out to be to 4-5 Palladium-XP
domains per 1 Protium. Our chips are 20 million gates.
Using Protium alone, compiling a new 20 million gate design on a
Protium-G1 takes ~15-16 hours, and we run it overnight. The CDNS
Protium SW does the mapping of RTL into Vivado gates which is 30%
of our compile. Then the Xilinx Vivado SW does the PnR into the
VU440's plus the additional timing closure takes 8 to 12 hours.
We also have Palladium, and we actually use Protium and Palladium for
the same thing, i.e. for software development and for debug of our
system (HW & SW) pre-silicon.
Palladium's advantages:
Better HW debug due to MUCH better observability (ability to trace
many signals for many cycles) and a MUCH faster compile time.
Protium's advantages:
At ~10 MHz SW operating speed, it's 10X faster than Palladium,
plus has about a 50% lower cost-per-gate.
Because Protium runs MUCH faster than Palladium, we primarily use
Protium for regression runs of long system tests and SW development
once our HW is stable enough.
After we've run our HW design on Palladium, we can port it automatically
to Protium. The process works very smoothly and is very important to us.
Our designs always work the first time on Protium after we Palladium
compile them.
Additionally, Protium's and Palladium's interfaces are very similar, and
we use the exact same environment on both. So, if any tests fail on
Protium we usually take them to Palladium, capture the traces, and debug
therein Pallasium. (We use Protium in a very similar way to our ASIC,
so we don't use backdoor memory upload or stop-and-resume the clock as
we do with Palladium.)
Our engineers typically use Protium in interactive mode, with one
engineer per machine, so we will have 5 engineers on 5 machines. We run
our regressions overnight and on the weekends.
I recommend Protium. It's especially good if you can combine Protium
with Palladium for very fast compiles, observability, and debug.
---- ---- ---- ---- ---- ---- ----
Cadence Protium
We use both Palladium and Protium, so I can comment both on Protium and
its integration with Palladium -- this integration is a big advantage
for us.
- Performance. Protium is a ~5 MHz, multiple FPGA-based
prototyping system that runs ~5X faster than Palladium.
- Capacity. Protium-S1 can handle similar design sizes as
Palladium, i.e. many 100s of millions of gates per box.
- Multiple users. We can enable multiple Protium users at the
same time, based on the how many designs are in the box. The
granularity of a design can be a FPGA -- the limitations come
into play when there is finite cabling for the interfaces
required.
Porting a design from Palladium into Protium is very fast.
If we have a design working on Palladium first, we can port it
"seamlessly" to Protium.
Cadence has touted this integration for several years; it has finally
matured now and is very useful.
- Normally, mapping our RTL to an FPGA-based emulator (e.g.
Mentor Veloce and Synopsys Zebu) is complex and time consuming.
It typically takes us 3 to 6 months to map our design in them,
get it to meet timing, and then validate that the design works.
- With Cadence, it takes 4 weeks to compile in Palladium
and then it was a seamless flow for us to port our design
database from Palladium over to Protium. We had to make
minimal changes to port the design -- replace DRAM models with
DRAMs as an example, but all other components like PCIe Speed
Bridge, NAND devices were retained.
- It was effectively a push button approach to get our design
working on Protium once we had a working Palladium database.
The Protium compiler partitioned our design into multiple FPGAs,
completed the place and route on multiple FPGAs and optimized timing.
It then did placement and routing on multiple machines and used the one
with the best results.
The whole porting process was <24 hours with minimum user intervention.
From a top-level perspective, this Palladium-Protium integration is
huge. If we didn't have it, we could not justify the time and effort to
set up a new FPGA-based system.
How Protium fits in our development cycle --
Since Protium is ~5X faster than Palladium on our design, it's better
suited for our firmware development and SW testing. We deploy Protium
once we reached maturity in our hardware and firmware development, as
at this stage most issues are within firmware.
If we find a problem/corner case on Protium, we bring the design back to
Palladium to debug it because it offers more visibility. Even though we
can view 1000's of signals on Protium, Palladium still gives us a much
more complete view.
Palladium and Protium platforms are complementary, and together they
provide a good vehicle for our SOC validation and product firmware
development well before we tape out.
---- ---- ---- ---- ---- ---- ----
Cadence Protium wins for small companies like ours.
We've used Cadence's Protium S1 prototyping system for our pre-silicon
software development for 11 months now.
Our goal is to be able to demonstrate our application and run the
software to show our silicon works.
- Once our RTL is good, we put it on the Protium machine; this
typically occurs about 3 months before tapeout.
- Using this approach, we gain an extra 6 months of software
development time before our chip comes back from the fab.
We are getting a 6 MHz SW operating speed from Protium, depending on
our design/application. For one complex design, we started out with a
2 MHz speed and then and then adjusted the compile switches to speed
it up to 6 MHz.
We chose Protium due to its fast implementation flow.
- Starting from scratch for a new ASIC design, it only took our
engineers 3 days to get it set up and running from the RTL.
(rather than requiring a full-time engineer)
- We hadn't previously run the design anywhere -- not even on
Palladium.
Protium is only 10-15% of the price of Palladium, so it has a high
value for a budget-conscious company such as ours. So, although we
have Palladium, if a design doesn't fit on Palladium, we use Protium
instead.
Protium has a number of good debug features also, e.g. we connect
Cadence's JTAG debug port for our software testing. Even so, our
biggest pain point in debugging on Protium (vs. Palladium) is that when
we make an RTL change, recompiling the design takes 6 hours with
Protium, compared to only 25 minutes with Palladium.
It's definitely still worth it for us to use Protium, as it is far less
expensive than Palladium. Additionally, we use Protium after our RTL is
stable, so we don't need much debug, and we can often see what's wrong
from the outside without looking for the signal.
We've now run Protium on one design and were able to debug 10's of
signals. (If something is super difficult to debug, we can choose to
run the debug on Palladium. Cadence has a good integration between both
boxes, and a similar interface, making it simple to do.)
It took us 3 days to compile completely new design. The design includes
standard cells for clock-gating and there were no issue to compile it.
Protium worked for us the first time afterward.
Two of our users have run designs in parallel, and used Protium's memory
upload feature, including the internal memory models and SRAMs, to good
effect.
Protium is especially valuable as it enabled us to be ready with our
software when our engineering samples arrived.
I've recommended it to colleagues at startups and smaller companies.
Other than debug, Protium works as well as Palladium does for us.
---- ---- ---- ---- ---- ---- ----
Cadence Protium
We use it for embedded firmware verification.
Our company began using Cadence's original RPP in 2012, and then
purchased the original Protium (2nd generation RPP) in 2015. We
purchased our first Protium S1 in 2018.
Our demand exceeded our capacity, so we needed additional system
emulation capacity. Plus, we have additional demand as we develop more
products in parallel, with new use cases being proposed to meet product
development cycle time goals.
- We generally use Protium for embedded firmware development and
verification.
- We typically use Palladium for HW/FW full debug visibility.
- Our engineers usually debug issues found in Protium in Palladium.
(For issues unique to Protium, we use Vivado for debug.)
Our company now has 14 Protium S1s, with each Protium providing one
domain. ~4M gates takes up to 4 hours to compile, and place-and-route
on Protium. Given our smaller design size, there is no advantage for
incremental compile, so we recompile the entire design each time.
Protium's advantage is the frontend compile and place & route that
guarantees the design will be functional once mapped to the FPGA -- so
you don't need to be an FPGA flow expert. We sacrifice some speed of
execution for the simpler FPGA flow.
Protium gives us a 4 Mhz to 10 Mhz speed (step clock)
Speed is very important for our embedded FW verification. Thus, our
ideal solution would offer a
1) simple flow,
2) a robust host interface, and
3) improved speed of execution.
Unfortunately, nothing today provides it all.
We have ~20 engineers who use the Protium platform. While we only
have 1 user per Protium at a given time, we use a dynamic reloading,
such that we can switch between projects on the fly. It works well for
our use case.
For us, Protium's best advantage over HAPS and Dini, is that the
Cadence XE compile process guarantees the design will be functional
once mapped to the FPGA.
---- ---- ---- ---- ---- ---- ----
Cadence Protium
The main reason why we evaluated Cadence Protium S1 was we were looking
for a cost-effective way add incremental capacity to our existing
Palladium installation.
Using a prototyping platform was only doable for us because Cadence had
an integration between Palladium and Protium -- so we could reuse the
Palladium setup and compile environment for Protium. We could not have
justified the extra evaluation turnaround time that Zebu or HAPS would
have taken us (e.g. 6-8 weeks or longer).
Below are Protium's approximate set up and compile times when we *reuse*
the existing Palladium design database:
New chip for first time (after Palladium flow setup) 7-10 days
Recompiling for RTL changes 1-2 days
Our primary Protium evaluation criteria was to confirm the integrated
compile flow between Palladium and Protium worked. Our results:
- The unified compile flow made it quick to get a new Palladium
database running in Protium.
- We were able to use Palladium's physical speed bridges for
Protium.
- Protium performs better than Palladium -- we got 2.5X faster
speed. This makes it attractive to our software team for
SW development needs relatively stable hardware.
- Protium is MUCH cheaper to buy overall vs. a Palladium box.
The combination of cost, easy set up, and speed up got us to move ahead
with Protiums in addition to a Palladium Z1.
This is how Protium currently fits in our verification methodology:
- Our hardware teams use Palladium Z1 for design/architecture
verification as well as reproducing issues that we may see in
silicon debug.
- Our software team uses both Palladium and Protium, jumping to
Protium when the design is more stable and does not require much
design debug. Because running tests and debugging SW code with
Protium is so similar to Palladium, they can easily move between
the two platforms.
- While Protium has its own hardware debug flow, however, we go
back to Palladium for hardware debug.
- Our verification team does debug exclusively on Palladium, as
it is the best in the industry for quick debug turnaround.
We are currently running a design size equivalent of 150M gates on
Protium. Our actual design is much bigger, but our design is modular
and repetitive, so we can use Protium to test one element.
Protium has definitely been a cost-effective way for us to add more
capacity to Palladium.
---- ---- ---- ---- ---- ---- ----
We were using Cadence Palladium plus Synopsys Zebu, but that cost us
two entire compile/set-up teams to do 4 weeks of work each.
With Palladium plus Protium, we can now do the same amount of work
using only 1/2 the number of engineers we had before.
---- ---- ---- ---- ---- ---- ----
1. Calibre
2. Protium + Palladium
3. BDA AFS
We only use best in class.
---- ---- ---- ---- ---- ---- ----
Palladium / Protium combo wins for us.
---- ---- ---- ---- ---- ---- ----
With Protium, we get the speed of an FPGA-based system, with the
fast ramp-up, debug, and signal traces of Palladiun. It only
takes seconds for us to see all the waveforms.
---- ---- ---- ---- ---- ---- ----
1. JasperGold
2. Perspec
3. Protium
---- ---- ---- ---- ---- ---- ----
Was Zebu for SW, Palladium for HW.
Now Protium for SW, Palladium for HW.
---- ---- ---- ---- ---- ---- ----
My vote is for Protium. It's hooks into Palladium work well.
---- ---- ---- ---- ---- ---- ----
Protium. It does what EVE Zebu does better.
---- ---- ---- ---- ---- ---- ----
We're a Zebu house, but what I really want is Protium.
---- ---- ---- ---- ---- ---- ----
Protium and VCS
---- ---- ---- ---- ---- ---- ----
We like Zebu4. Protium is still not mature.
---- ---- ---- ---- ---- ---- ----
I think HAPS is better than Protium.
---- ---- ---- ---- ---- ---- ----
We get max Mhz with HAPS that beats out Protium.
---- ---- ---- ---- ---- ---- ----
We like hands-on. Protium is automated, but we can get an extra
3 or 4 Mhz using HAPS if you tune it enough.
---- ---- ---- ---- ---- ---- ----
If I have access to our Palladium, which is rare, I want Protium.
Otherwise, I want a Zebu Server 4.
---- ---- ---- ---- ---- ---- ----
Related Articles
CDNS Protium crazy fast "Palladium-compiles" #1a for Best of 2019
CDNS Palladium wins back user mindshare is #1b as the Best of 2019
MENT Veloce Strato, Virtual Lab, Hycon makes #1c for Best of 2019
SNPS Zebu Intel shipments slipping 2 quarters is #1d Best of 2019
Join
Index
Next->Item
|
|