( DAC'19 Item 1a ) ------------------------------------------------ [12/19/19]

Subject: CDNS Protium crazy fast "Palladium-compiles" #1a for Best of 2019

FAST COMPILES ROCK!: My quick-and-dirty summary of the emulator/prototyper
world.  Say you have two designs to simulate.  One design is 200 million
gates, the other design is 1 Billion gates.
Initial Ramp Up Time /
Incremental Compile Time

Operating Speed

Palladium
200 M gates
1.0 B gates

initial ramp 2-4 weeks
1.0 hour
5.0 hours


1.2 Mhz
800 Khz

Zebu Server 4
200 M gates
1.0 B gates

initial ramp 4-6 weeks
25.8 hours (1.1 days)
41.2 hours (1.7 days)


2.0 Mhz
750 Khz

HAPS-80
200 M gates
1.0 B gates

initial ramp 2-3 months
 93.6 hours (3.9 days)
146.4 hours (6.1 days)


20.0 Mhz
5.0 Mhz

Veloce Strato
200 M gates
1.0 B gates

initial ramp 3-5 weeks
5.1 hours
12.5 hours


1.6 Mhz
750 Khz


Protium
200 M gates
1.0 B gates

ramp 24 hours w/Palladium
ramp 4-6 weeks w/o Palladium
28.8 hours (1.2 days)
50.4 hours (2.1 days)



8.3 Mhz
4.5 Mhz
Notice after the painful initial ramp compile (of weeks or months) the later
*incremental* compile times are *much* faster at hours or days.  The two
extremes are a custom processor Palladium that initial compiles in 4 weeks
and incremental compiles in 5 hours -- but gets SW speeds of 800 Khz; and
the HAPS-80 with Xilinx FPGA's that initial compiles in 3 months and later
incremental compiles in 4 to 6 days -- but gets SW speeds of 20.0 Mhz.

HOW PROTIUM CHEATS: The one outlier in this table is Protium.  It has two
different "initial ramp compile times"  If you take your RTL straight into
Protium, that first initial ramp is 4-6 weeks.  But if you port a Palladium
design into a Protium, your initial ramp is only 24 hours.  This is how 
FPGA-based Protium cheats!

The Protium users also gushed a lot about the fact that they could go back
to Palladium for fast debug and waves if needed.

    "We run our design on Protium then go back to Palladium for debug.
     With Palladium, we can capture waves up and down the hierarchy of
     every net in our chip.  It's a really big advantage of Protium."

    "With Protium, we get the speed of an FPGA-based system, with the
     fast ramp-up, debug, and signal traces of Palladiun.  It only
     takes seconds for us to see all the waveforms."

In addition, Protium took 1.2 to 2.1 days to recompile vs. Synopsys HAPS
taking 3.9 to 6.1 days to recompile.
        
And that's why Protium (actually the crazy fast incremental Protium compiles
with FPGA 8.3 Mhz simulation speeds) wins the #1 Best of EDA in 2019 award
from the end users this year.

(And it doesn't hurt that Protiums are 1/3rd the price of a Palladium, too.)

        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----

      QUESTION ASKED:

        Q: "What were the 3 or 4 most INTERESTING specific EDA tools
            you've seen this year?  WHY did they interest you?"

        ----    ----    ----    ----    ----    ----    ----

    This year we choose Cadence Protium S1 to be the #1 Best of 2019.

    Protium is CDNS's FPGA-based rapid prototyper.  It rivals SNPS Zebu.

     Capacity: Zebu maxes at ~3 B gates, while Protium at 10 B gates.
        Speed: Zebu maxes at ~2.5 Mhz while Protium gets 5 to 16 MHz.

    In comparison CDNS Palladium is 10 B gates as around 1 MHz.

    Why we chose Protium as Best of 2019 is because Cadence did a great
    job of automatically porting Palladium's build flow into Protium.

    Their much faster *combined* compile time sold us on the CDNS pair.

    This is our process:

        1. We compile and run our initial testing on Palladium.

        2. We port our Palladium build to Protium.  The Protium tool 
           chain and compile process incorporates the design netlist  
           used for Palladium compiles.  Based on previous builds on
           Palladium, it has taken about 1 or 2 weeks to first port  
           of the design working on Protium.  (Our designs are huge!)

           In comparison, bring up in a SNPS HAPS-80 FPGA-based system
           could take 6 weeks to 3 months for a design our size.  I do
           not have a precise comparison.  It's what we've heard.

        3. We start a different verification fork with Protium.  Protium
           S1 runs about 8X faster than Palladium.  We can also better
           utilize the chip's external interfaces like DDR memory, UART,
           QSPI, etc...  by connecting actual devices to them via 
           expansion boards.  

        4. If we find a bug in Protium, we can then go back to Palladium
           for debug, because the representations are similar.  

           Debug is much better on Palladium, in part because it can 
           capture a longer period time of signal activity.  Palladium 
           has more probing capabilities for signals in the design 
           hierarchy.  It has more trace memory as well.  (All stuff
           that Zebu and HAPS are weak in.)

           That is not say that Palladium is better than Protium for 
           debugging.  It just depends on what is problem is being 
           traced and isolated.  

        5. We fix the bug and continue our testing on Protium.

    This makes the Protium + Palladium combination our preferred platform
    for both our software and hardware system developers.

       ----    ----    ----    ----    ----    ----    ----

    Cadence Protium is a Xilinx FPGA-based hardware and software tool.  
    We've had it for 2.5 years now and have 5 boxes of Protium-G1 based
    on 1 UltraScale VU440 FPGA.

    We use Protium to compile a prototype of our ASIC device, for software 
    development, and to run system tests (HW & SW) for verifying our ASIC 
    logic prior to tapeout.

    Before using Protium, we used a Dini-based FPGA system for prototyping
    -- which did not have all the SW & HW infrastructure that Cadence 
    provides.  

    Our primary considerations for moving from Dini to Protium were:

        - Our Dini system had a very long bring up cycle back then,
          and we were looking for a way to do it faster.

        - The fact that Cadence provides a PCIe SpeedBridge was a
          big factor.

        - Last, but not least, another big consideration was that 
          Protium has a very similar development flow as Palladium.  

    First off, Cadence PCIe SpeedBridge really works great.  And, unlike 
    with Dini, with Protium we could use the same PCIe-controller that we 
    use in the ASIC w/o any modification.

    We have 5 Protiums, each with one UltraScale VU440 FPGAs.  The gate
    count capacity more-or-less works out to be to 4-5 Palladium-XP
    domains per 1 Protium.  Our chips are 20 million gates.
 
    Using Protium alone, compiling a new 20 million gate design on a
    Protium-G1 takes ~15-16 hours, and we run it overnight.   The CDNS
    Protium SW does the mapping of RTL into Vivado gates which is 30%
    of our compile.  Then the Xilinx Vivado SW does the PnR into the
    VU440's plus the additional timing closure takes 8 to 12 hours.

    We also have Palladium, and we actually use Protium and Palladium for
    the same thing, i.e. for software development and for debug of our 
    system (HW & SW) pre-silicon.  

    Palladium's advantages: 

        Better HW debug due to MUCH better observability (ability to trace
        many signals for many cycles) and a MUCH faster compile time.  

    Protium's advantages: 

        At ~10 MHz SW operating speed, it's 10X faster than Palladium,
        plus has about a 50% lower cost-per-gate.  

    Because Protium runs MUCH faster than Palladium, we primarily use 
    Protium for regression runs of long system tests and SW development
    once our HW is stable enough.

    After we've run our HW design on Palladium, we can port it automatically
    to Protium.  The process works very smoothly and is very important to us. 
    Our designs always work the first time on Protium after we Palladium
    compile them.

    Additionally, Protium's and Palladium's interfaces are very similar, and
    we use the exact same environment on both.  So, if any tests fail on 
    Protium we usually take them to Palladium, capture the traces, and debug
    therein Pallasium.  (We use Protium in a very similar way to our ASIC,
    so we don't use backdoor memory upload or stop-and-resume the clock as
    we do with Palladium.)

    Our engineers typically use Protium in interactive mode, with one 
    engineer per machine, so we will have 5 engineers on 5 machines.  We run
    our regressions overnight and on the weekends.

    I recommend Protium.  It's especially good if you can combine Protium
    with Palladium for very fast compiles, observability, and debug.  

       ----    ----    ----    ----    ----    ----    ----

    Cadence Protium

    We use both Palladium and Protium, so I can comment both on Protium and
    its integration with Palladium -- this integration is a big advantage 
    for us.  

        - Performance.  Protium is a ~5 MHz, multiple FPGA-based 
          prototyping system that runs ~5X faster than Palladium.  

        - Capacity.  Protium-S1 can handle similar design sizes as 
          Palladium, i.e. many 100s of millions of gates per box.  

        - Multiple users.  We can enable multiple Protium users at the 
          same time, based on the how many designs are in the box.  The
          granularity of a design can be a FPGA -- the limitations come 
          into play when there is finite cabling for the interfaces 
          required.
 
    Porting a design from Palladium into Protium is very fast.

    If we have a design working on Palladium first, we can port it 
    "seamlessly" to Protium.

    Cadence has touted this integration for several years; it has finally
    matured now and is very useful.

        - Normally, mapping our RTL to an FPGA-based emulator (e.g. 
          Mentor Veloce and Synopsys Zebu) is complex and time consuming.
          It typically takes us 3 to 6 months to map our design in them,
          get it to meet timing, and then validate that the design works.  

        - With Cadence, it takes 4 weeks to compile in Palladium
          and then it was a seamless flow for us to port our design 
          database from Palladium over to Protium.  We had to make 
          minimal changes to port the design -- replace DRAM models with 
          DRAMs as an example, but all other components like PCIe Speed 
          Bridge, NAND devices were retained.

        - It was effectively a push button approach to get our design 
          working on Protium once we had a working Palladium database.  

    The Protium compiler partitioned our design into multiple FPGAs, 
    completed the place and route on multiple FPGAs and optimized timing.  
    It then did placement and routing on multiple machines and used the one
    with the best results.  

    The whole porting process was <24 hours with minimum user intervention.

    From a top-level perspective, this Palladium-Protium integration is 
    huge.  If we didn't have it, we could not justify the time and effort to
    set up a new FPGA-based system.  

    How Protium fits in our development cycle --

    Since Protium is ~5X faster than Palladium on our design, it's better 
    suited for our firmware development and SW testing.  We deploy Protium
    once we reached maturity in our hardware and firmware development, as
    at this stage most issues are within firmware.
 
    If we find a problem/corner case on Protium, we bring the design back to
    Palladium to debug it because it offers more visibility.  Even though we
    can view 1000's of signals on Protium, Palladium still gives us a much
    more complete view.

    Palladium and Protium platforms are complementary, and together they 
    provide a good vehicle for our SOC validation and product firmware 
    development well before we tape out.

       ----    ----    ----    ----    ----    ----    ----

    Cadence Protium wins for small companies like ours.

    We've used Cadence's Protium S1 prototyping system for our pre-silicon
    software development for 11 months now.  

    Our goal is to be able to demonstrate our application and run the 
    software to show our silicon works. 
 
        - Once our RTL is good, we put it on the Protium machine; this 
          typically occurs about 3 months before tapeout.  

        - Using this approach, we gain an extra 6 months of software 
          development time before our chip comes back from the fab.

    We are getting a 6 MHz SW operating speed from Protium, depending on
    our design/application.  For one complex design, we started out with a
    2 MHz speed and then and then adjusted the compile switches to speed
    it up to 6 MHz.

    We chose Protium due to its fast implementation flow.

        - Starting from scratch for a new ASIC design, it only took our 
          engineers 3 days to get it set up and running from the RTL.  
          (rather than requiring a full-time engineer)

        - We hadn't previously run the design anywhere -- not even on 
          Palladium.  

    Protium is only 10-15% of the price of Palladium, so it has a high 
    value for a budget-conscious company such as ours.  So, although we 
    have Palladium, if a design doesn't fit on Palladium, we use Protium
    instead.

    Protium has a number of good debug features also, e.g. we connect 
    Cadence's JTAG debug port for our software testing.  Even so, our
    biggest pain point in debugging on Protium (vs. Palladium) is that when
    we make an RTL change, recompiling the design takes 6 hours with 
    Protium, compared to only 25 minutes with Palladium.

    It's definitely still worth it for us to use Protium, as it is far less
    expensive than Palladium.  Additionally, we use Protium after our RTL is
    stable, so we don't need much debug, and we can often see what's wrong 
    from the outside without looking for the signal.

    We've now run Protium on one design and were able to debug 10's of 
    signals.  (If something is super difficult to debug, we can choose to
    run the debug on Palladium.  Cadence has a good integration between both
    boxes, and a similar interface, making it simple to do.)

    It took us 3 days to compile completely new design.  The design includes
    standard cells for clock-gating and there were no issue to compile it.
    Protium worked for us the first time afterward.

    Two of our users have run designs in parallel, and used Protium's memory
    upload feature, including the internal memory models and SRAMs, to good
    effect.

    Protium is especially valuable as it enabled us to be ready with our
    software when our engineering samples arrived.

    I've recommended it to colleagues at startups and smaller companies.  
    Other than debug, Protium works as well as Palladium does for us.

       ----    ----    ----    ----    ----    ----    ----

    Cadence Protium

    We use it for embedded firmware verification.

    Our company began using Cadence's original RPP in 2012, and then 
    purchased the original Protium (2nd generation RPP) in 2015.  We 
    purchased our first Protium S1 in 2018.

    Our demand exceeded our capacity, so we needed additional system 
    emulation capacity.  Plus, we have additional demand as we develop more
    products in parallel, with new use cases being proposed to meet product 
    development cycle time goals.
 
    - We generally use Protium for embedded firmware development and 
      verification.

    - We typically use Palladium for HW/FW full debug visibility. 
 
    - Our engineers usually debug issues found in Protium in Palladium.
      (For issues unique to Protium, we use Vivado for debug.)

    Our company now has 14 Protium S1s, with each Protium providing one 
    domain.  ~4M gates takes up to 4 hours to compile, and place-and-route
    on Protium.  Given our smaller design size, there is no advantage for
    incremental compile, so we recompile the entire design each time.

    Protium's advantage is the frontend compile and place & route that 
    guarantees the design will be functional once mapped to the FPGA -- so 
    you don't need to be an FPGA flow expert.  We sacrifice some speed of 
    execution for the simpler FPGA flow.
 
    Protium gives us a 4 Mhz to 10 Mhz speed (step clock)

    Speed is very important for our embedded FW verification.  Thus, our 
    ideal solution would offer a

      1) simple flow,
      2) a robust host interface, and
      3) improved speed of execution.  

    Unfortunately, nothing today provides it all.

    We have ~20 engineers who use the Protium platform.  While we only
    have 1 user per Protium at a given time, we use a dynamic reloading, 
    such that we can switch between projects on the fly.  It works well for
    our use case.

    For us, Protium's best advantage over HAPS and Dini, is that the
    Cadence XE compile process guarantees the design will be functional
    once mapped to the FPGA.  

       ----    ----    ----    ----    ----    ----    ----

    Cadence Protium 

    The main reason why we evaluated Cadence Protium S1 was we were looking
    for a cost-effective way add incremental capacity to our existing
    Palladium installation.  
 
    Using a prototyping platform was only doable for us because Cadence had
    an integration between Palladium and Protium -- so we could reuse the 
    Palladium setup and compile environment for Protium.  We could not have
    justified the extra evaluation turnaround time that Zebu or HAPS would
    have taken us (e.g. 6-8 weeks or longer).

    Below are Protium's approximate set up and compile times when we *reuse* 
    the existing Palladium design database:
      
        New chip for first time (after Palladium flow setup)   7-10 days
        Recompiling for RTL changes                             1-2 days

    Our primary Protium evaluation criteria was to confirm the integrated 
    compile flow between Palladium and Protium worked.  Our results:  
 
        - The unified compile flow made it quick to get a new Palladium
          database running in Protium.  

        - We were able to use Palladium's physical speed bridges for 
          Protium.

        - Protium performs better than Palladium -- we got 2.5X faster
          speed.  This makes it attractive to our software team for
          SW development needs relatively stable hardware.

        - Protium is MUCH cheaper to buy overall vs. a Palladium box.
 
    The combination of cost, easy set up, and speed up got us to move ahead
    with Protiums in addition to a Palladium Z1.

    This is how Protium currently fits in our verification methodology:

        - Our hardware teams use Palladium Z1 for design/architecture 
          verification as well as reproducing issues that we may see in 
          silicon debug.

        - Our software team uses both Palladium and Protium, jumping to 
          Protium when the design is more stable and does not require much 
          design debug.  Because running tests and debugging SW code with 
          Protium is so similar to Palladium, they can easily move between 
          the two platforms.  
 
        - While Protium has its own hardware debug flow, however, we go
          back to Palladium for hardware debug.  

        - Our verification team does debug exclusively on Palladium, as 
          it is the best in the industry for quick debug turnaround.  

    We are currently running a design size equivalent of 150M gates on 
    Protium.  Our actual design is much bigger, but our design is modular 
    and repetitive, so we can use Protium to test one element.  

    Protium has definitely been a cost-effective way for us to add more 
    capacity to Palladium.  

       ----    ----    ----    ----    ----    ----    ----

    We were using Cadence Palladium plus Synopsys Zebu, but that cost us
    two entire compile/set-up teams to do 4 weeks of work each.

    With Palladium plus Protium, we can now do the same amount of work
    using only 1/2 the number of engineers we had before.

       ----    ----    ----    ----    ----    ----    ----

    1. Calibre
    2. Protium + Palladium
    3. BDA AFS

    We only use best in class.

       ----    ----    ----    ----    ----    ----    ----

    Palladium / Protium combo wins for us.

       ----    ----    ----    ----    ----    ----    ----

    With Protium, we get the speed of an FPGA-based system, with the
    fast ramp-up, debug, and signal traces of Palladiun.  It only
    takes seconds for us to see all the waveforms.

       ----    ----    ----    ----    ----    ----    ----

    1. JasperGold
    2. Perspec
    3. Protium

       ----    ----    ----    ----    ----    ----    ----

    Was Zebu for SW, Palladium for HW.
    Now Protium for SW, Palladium for HW.

       ----    ----    ----    ----    ----    ----    ----

    My vote is for Protium.  It's hooks into Palladium work well.

       ----    ----    ----    ----    ----    ----    ----

    Protium.  It does what EVE Zebu does better.

       ----    ----    ----    ----    ----    ----    ----

    We're a Zebu house, but what I really want is Protium.

       ----    ----    ----    ----    ----    ----    ----

    Protium and VCS

       ----    ----    ----    ----    ----    ----    ----

    We like Zebu4.  Protium is still not mature.

       ----    ----    ----    ----    ----    ----    ----

    I think HAPS is better than Protium.

       ----    ----    ----    ----    ----    ----    ----

    We get max Mhz with HAPS that beats out Protium.

       ----    ----    ----    ----    ----    ----    ----

    We like hands-on.  Protium is automated, but we can get an extra
    3 or 4 Mhz using HAPS if you tune it enough.

       ----    ----    ----    ----    ----    ----    ----

    If I have access to our Palladium, which is rare, I want Protium.

    Otherwise, I want a Zebu Server 4.

       ----    ----    ----    ----    ----    ----    ----

Related Articles

    CDNS Protium crazy fast "Palladium-compiles" #1a for Best of 2019
    CDNS Palladium wins back user mindshare is #1b as the Best of 2019
    MENT Veloce Strato, Virtual Lab, Hycon makes #1c for Best of 2019
    SNPS Zebu Intel shipments slipping 2 quarters is #1d Best of 2019

Join    Index    Next->Item







   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.














Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)