( DAC'20 Item 05a ) ----------------------------------------------- [10/21/21]

Subject: CDNS Palladium Z1 speed, uptime, & cloud access is Best of 2020 #5a

THE EMULATION JUMBO JET: There are four basic reasons why engineers like
about using Palladiums -- and why the Z1 beat out MENT & SNPS HW boxes in
2020.  A good analogy explaining this is flying overseas on a jumbo jet.
  - 1. Fast compile time.  Building the design database.

       "6:50 PM, Tues. Sept 14, Virgin Atlantic is flying 252 people
        from New York (JFK) to London Heathrow Airport (LHR)"

       "The Z1 takes 5-6 hours to compile with CDNS' parallel compile."

  - 2. Allocation.  Placing the database inside the available resources.

       "Flight USX38 has a complete seat assignment of all 252 passengers.
        All families and related groups are successfully seated together."

       "Palladium's biggest strength is fine granularity in sizing the
        design to various footprints at compile time -- plus flexible
        dynamic placement at runtime." 

  - 3. Runtime.  How fast your design database run in different modes.
       (ICE, simulation acceleration, CAKE, ...)

       "How long does this flight from New York City to London take?
        Any stops in Iceland/Ireland/France along the way?"

       "We've actively run Palladium Z1 on our full SSD SoC designs
        for 3 years now.  Got speeds close to 2 MHz."

  - 4. Debug.  How easy is quick detailed visibilty into your bugs?

       "There is a severe storm in the North Atlantic.  The pilot needs
        immediate access to dynamically changing (and potentially
        plane-crashing) weather and GPS data to help him reroute."

       "The speed at which it can gather waves is amazing...  Z1 only
        takes 60 seconds or so to get the waveforms for a 6-board
        design.  Super fast!  The rivals can't get near that!"

        ----    ----    ----    ----    ----    ----    ----

ALSO, UPTIME & PREDICTABILTY BEAT OUT FPGA: Also one 2020 user commented on
Palladium Z1's ability to get back up & online fast -- compared against his
lesser earlier experiences with Mentor Veloce.
     
    "Palladium's uptime was good, even during eval.  And when we did have
     problems (even after eval), Cadence fixed them fast.  The CDNS FAEs
     would just swap a system board and it was back up running right away.

     My experience was that Veloce had a lot of downtime due to failures.
     Veloce was also prone to longer downtimes, and Mentor's process of
     debugging Veloce problems was complex."

And another 2020 user liked that CDNS HW would predictability run once it
compiled; compared with some of the FPGA-based boxes (Veloce/ZeBu/HAPS).

    "The most appealing Palladium advantage over its [MENT/SNPS] rivals
     is that "if it compiles, it will run." -- can't say that with
     those other FPGA-based guys! 

     Palladium mostly eliminates HW platform-induced gotchas from our
     debug round trips -- which is a high value to us when our design
     churn is high."

        ----    ----    ----    ----    ----    ----    ----

FCLK, CAKE, & CLOUD: Three new techie topics that users mentioned this time
around in the survey (oddly all Palladium related this year) were:
    Cake Modes -- 25% core clock speed up relative to FCLK

    CAKE-2's design speed is half of the Palladium clock speed.  It samples
    at the edge of the fastest clock.  CAKE-1 is faster; it is the same
    as the main clock speed.  It samples both the pos and neg edges.

        ----    ----    ----    ----    ----    ----    ----

    You might want to check out the Palladium Cloud that CDNS is selling.

    We're a small Tier 3, so overall cost is our #1 concern for emulation.

        ----    ----    ----    ----    ----    ----    ----

AIR VS. WATER: But one advantage three MENT Veloce users cited vs. Cadence
Palladium Z1's was Veloce's are air cooled vs. Palladium's are water cooled.

    Mgmt likes that Veloces are air cooled.

I know that saves on install costs -- because no plumbing is required
for the Veloce's -- but I don't know if Palladiums use much less kWatts per
year cause they're water cooled.  (See ESNUG 567 #3, 574 #3, 532 #13)

        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----

      QUESTION ASKED:

        Q: "What were the 3 or 4 most INTERESTING specific EDA tools
            you've seen in 2020?  WHY did they interest you?"

        ----    ----    ----    ----    ----    ----    ----

    Since most of my life revolves Z1's these days, I should probably
    nominate it as my most interesting EDA tool of 2020.

        ----    ----    ----    ----    ----    ----    ----

    The Palladium Z1 is our workhorse around here.

        ----    ----    ----    ----    ----    ----    ----

    Cadence Z1

        ----    ----    ----    ----    ----    ----    ----

    Cadence Palladium Z1

    Our group has used Cadence Palladium for over 15 years.

    For a non-FPGA box, it has blazing runtime performance; plus we like
    its fast turnaround during our compile-and-debug iterations. 

    We use Palladium for hardware/software co-verification, and virtual
    emulation -- but ICE is our most common use model.

    For us Z1's strengths are compile time and time-to-visibility

        - They are by far better than the competition (that we can't
          name, John.)  We have more than a dozen Palladium emulator
          racks and have designs that scale up to 12 racks (Z1).  The
          Z1 takes 5-6 hours to compile with Cadence's parallel compile.

    We use the Z1 for block-level verification as well as for verifying
    our entire design.  Depending on our design size, we get speeds from
    250 kHz to 1 MHz.  Our experience is that Palladium's post-compile
    runtime is it's reliable and matches the semantics of simulation. 

    We use Cadence's physical and virtual bridges to connect to our design.

    Both have their use cases and advantages.  A physical bridge provides
    easier bring up for standard interfaces.  A virtual bridge gives an
    edge on custom or newer interfaces -- it also provides more leverage
    and flexibility in setting up our lab infrastructure. 
 
    We love Z1's debug-time-to-visibility. 

        - The speed at which it can gather waves is amazing -- it's a 
          distinct advantage over the competition (that I can't name.) 

        - Z1 only takes 60 seconds or so to get the waveforms for a
          6-board design.  Super fast!  The rivals can't get near that!

    Our lab footprint is tremendous, so it's important for us to use 
    Palladium efficiently.  We have multiple engineers using the Z1 all
    the time without any issues.

    I'd strongly recommend Palladium Z1 over its rivals.  It's not perfect,
    but it's much better than what's on the market right now.  The most
    important features for any emulation tool are debug visibility and
    fast compile times.  Palladium excels in both enormously.  To add to
    that, it's extremely reliable. 

        ----    ----    ----    ----    ----    ----    ----

    Our Palladium vs. Veloce vs. Zebu eval

    We've actively run Palladium Z1 on our full SSD SoC designs for
    3 years now.  Got speeds close to 2 MHz.  

    Our primary use it for SOC debug, and firmware development.  We chose
    Palladium Z1 instead of Veloce based on:
 
        - Our Palladium eval 

        - My prior knowledge of Veloce 

        - As far as Zebu was concerned, we initially gave some thought
          to looking into it, but for us, Zebu came across more as
          testbench acceleration than a fast compiling debug box.
          So we eliminated Zebu altogether from our eval.

        - Palladium is better for our In Circuit Emulation (ICE) needs.

    Palladium vs. Veloce Uptime/Downtime --

        - Palladium's uptime was good, even during eval.  And when we
          did have problems (even after eval), Cadence fixed them fast.
          The CDNS FAEs would just swap a system board and it was back
          up running right away.

        - My experience was that Veloce had a lot of downtime due to
          failures.  Veloce was also prone to longer downtimes, and
          Mentor's process of debugging Veloce problems was complex.

    Initial set up for Palladium --

    Our first design on Palladium went from scratch to fully deployed in
    only 3 months, including the hardware shipping, set up, mapping,
    compiling, testing, and delivering to our ASIC and firmware teams.

    A completely new design now takes us only 3 weeks to start using 
    Palladium Z1.  And it's only overnight if we have any RTL change.

    Speeding up Palladium --
 
    Tricks to speed-up Palladium's out-of-the-box runtime.  We want
    our core functional clock as close as possible to Palladium FCLK
    (the fastest operational clock)

    For easy math, let's assume we start with a core clock at 1 MHz 
    out-of-the-box speed.  The numbers below each add directly to
    the 1 MHz. (i.e., they don't compound.)

    1. Multi-compile -- 20% core clock speed up relative to FCLK

       Palladium has a multi-compile option, where we can have it 
       automatically run multiple compile iterations, and the pick
       the optimal one.  In our case:

         - One iteration compile took 3 hours.

         - Then a 30-iteration compile took 12 hours.

         - We also tried a 40-iteration compile, but the improvement
           over 30 was negligible.  And 12 hours is a sweet spot for
           us, as it is an overnight run.

       Multi-compiles got us a 20% bump in the core clock speed.  

       Note: This technique is most effective after your RTL is stable.

    2. Cake Modes -- 25% core clock speed up relative to FCLK

       CAKE-2's design speed is half of the emulator clock speed.
       It samples at the edge of the fastest clock.

       CAKE-1 is faster; it is the same as the main clock speed.
       It samples both the positive and negative edges.

    We always emulate with CAKE-2 first and check the functionality. This
    is because all clocks are positive edge triggered and run the way we 
    expect in silicon.  We then recompile in CAKE-1 for higher performance.

    Doing Cake Modes gave us 25% more performance.

    3. Shadow Net Optimization -- 10% core clock speed up relative to FCLK

       When switching from CAKE-2 (positive edge processing) to CAKE-1 
       (positive and negative edge processing), Palladium needs to add
       a value for the signal it's interpreting. 

       This adds nets -- which increased our utilization -- our initial 
       design was taking 2x the capacity from shadow nets and they also 
       negatively impacted performance. 

       So, we do must do shadow net optimization next.  80% of the 
       shadow nets from CAKE-1 were not real, but rather the compiler's
       interpretation of the code.  The compiler always conservatively 
       assumes that both (positive and negative) edges of a signal are
       used, and we just tell the tool that it does not need to compute 
       on some negative edges.

       We run Palladium's shadow net optimization script and Palladium
       automatically tweaks everything -- removing the unwanted shadows 
       and using a different placement.  

       The result was another 10% speed-up, so making the total for
       CAKE-1 plus Shadow Net Optimization is 35%.

    All the three of these methods combined took got us 55% improvement.
    That means our earlier 1 MHz out-of-the-box speed jumped to 1.55 MHz.

    We also used other methods, such as clock divider bypass and removing 
    DFT logic, for further improvement, and ultimately got approximately 75%
    core clock improvement over FCLK.  In other words, our core clock is now
    75% closer to FCLK when we finished all the improvements.  So, when FCLK
    is 2 MHz our core clock now runs at 1.55MHz.

    It took us 3 months of total effort to get there.  We were able to do it
    in parallel with our other emulation activity.

    As for our end-to-end testing, i.e., from the host PC to the hard drive,
    we improve our IO Operations (IOPS) by 9x. 

    Palladium Compile --

    Our designs fit into one Palladium Z1 cluster -- each Z1 has 2 to 3 
    clusters.  Cadence claims 2.3B gates in single Z1 system. 

    We compile our mature designs only when we have significant change to 
    our design and want a high performing database. 

    Our compile process has three 3 steps: synthesis, import, and compile. 
    It takes under 3 hours.  Palladium runs well after we compile -- as long
    as we have sufficient capacity to download what we have compiled. 

    In contrast, with FPGA-based systems, just because it compiles, it does
    NOT mean it will meet timing.  (However, my experience with Veloce is
    it always generate functional results if it successfully compiles and
    capacity is available.)

    Conclusion --

    Our team has used Palladium for architecture analysis all the way to 
    post-silicon validation.  We've also experimented with it for power 
    verification.

    Up to 8 of our engineers will use it at the same time -- this works 
    well for us, as we've built a cloud-style emulation layer over it.

    We really like Palladium's ease-of-use and debug.  It takes only 5 to 30
    minutes to get waveforms.  Also, it's highly reliable; we go many months
    at a stretch with no problems.  Some of our validation regressions
    run for several months without any interruptions.

    Cadence's support is also great.  When I say support, I'm referring to 
    both the people and the process.  I'm impressed with the short time it 
    takes for Cadence to run diagnostics to find bad board, replace it, and
    check that everything is working.

        ----    ----    ----    ----    ----    ----    ----

    We've used the Z1 emulator for over a year.  It's our vendor-hosted VCAD
    (Virtual Integrated Computer-Assisted Design) environment.

    We get speeds of 1 to 2 MHz in 1xua/CAKE1 with Palladium Z1, depending
    on the model and utilization.  We have up to 6 engineers running 
    concurrent/simultaneous emulation jobs. 

    Palladium's biggest advantages over its rivals are:

        1. Fine granularity in sizing the design to various footprints at 
           compile time -- plus flexible dynamic placement at runtime. 

        2. Fast compile times and a highly tunable compile flow that 
           allows for efficient use of compute. 

        3. Strong performance for simulation acceleration and TBA 
           (Transaction-Based Acceleration) co-modeling features. 

        4. The most appealing Palladium advantage over its rivals is
           that "if it compiles, it will run." -- can't say that with
           the other FPGA-based guys!  

           Palladium mostly eliminates HW platform-induced gotchas from
           our debug round trips -- which is a high value to us when our
           design churn is high.

    We definitely don't miss the FPGA partitioning, place and route, and 
    timing issues that FPGA-based emulation has. 

    "Palladium Cloud"

    With its hosted VCAD offering (Virtual Integrated Computer-Assisted 
    Design, aka the Palladium Cloud), Cadence has a very attractive
    solution for smaller companies like us to get going fast with
    emulation -- without having to incur the upfront/NRE costs of
    building out an emulation lab/datacenter. 

    We mostly use Palladium Z1 for system-level software workloads running
    on virtual hybrid system models.  Our use cases are: HW verification,
    SW verification, HW/SW co-verification, architecture analysis,
    verification acceleration, and virtual emulation.

    Our capacity needs vary -- we target multiple design elaboration 
    configurations ranging from few domains to few/several boards. 

        - We see compilation speeds at approximately 1 board per 
          hour, for 10 sequential compile trials.  This speed number is 
          for clean RTL builds -- we do not do incremental compiles.

        - We use parallel synthesis, but currently do not parallelize
          the compile trials for various reasons.

    Palladium Debug

    Overall, Palladium's debug functionality is good/excellent on the 
    front-end features and the language side. It is ok/good on usability,
    and ok/acceptable on performance.  FullVision is great in absence of 
    capacity concerns -- it lets you do what the name suggests, have full
    RTL visibility compiled into the database without the need of 
    compile-time probing. 

    Needs Improvement: FullVision's source-level debug annotation is weak
    compared to flagship debug environments. The waveform rendering requires 
    significant compute for parallel processing on larger designs; SimVision
    is very basic -- it'd be nice if Cadence included Indago in the base 
    package, too.

    Cadence's VCAD with Palladium Z1 and Cloud is our overall best option. 

        ----    ----    ----    ----    ----    ----    ----

    You might want to check out the Palladium Cloud that CDNS is selling.

    We're a small Tier 3, so overall cost is our #1 concern for emulation.

        ----    ----    ----    ----    ----    ----    ----

    The Z1 is our mainstay.

        ----    ----    ----    ----    ----    ----    ----

    Pallium

        ----    ----    ----    ----    ----    ----    ----

    That new Pallium Cloud looks interesting.

        ----    ----    ----    ----    ----    ----    ----

    Mgtmt want us to price out the CDNS emulation cloud stuff.

        ----    ----    ----    ----    ----    ----    ----

    Boss man wants a Veloce Cloud vs. Palladium Cloud eval.

    Do you have one, John?

        ----    ----    ----    ----    ----    ----    ----

    Anirudh's Z1

        ----    ----    ----    ----    ----    ----    ----

    Palladium

        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----
USERS ON THE VELOCE HW BOXES

    Veloce Strato

        ----    ----    ----    ----    ----    ----    ----

    We have giant room heated by Veloce boxes at a remote site.

        ----    ----    ----    ----    ----    ----    ----

    Air cooled Veloce's install easier than water cooled Palladiums.

        ----    ----    ----    ----    ----    ----    ----

    Mgmt likes that Veloces are air cooled.

        ----    ----    ----    ----    ----    ----    ----

    My MENT FAE asked me to write to you about the upside of
    air cooled Veloces compared to water cooled Palladiums.

    Air cooled is better.

    Done.

        ----    ----    ----    ----    ----    ----    ----

    Strato and Strato+

        ----    ----    ----    ----    ----    ----    ----


    Mentor Veloce does it for us.

        ----    ----    ----    ----    ----    ----    ----

    I'd mention Veloce for that since I use it daily.

        ----    ----    ----    ----    ----    ----    ----

    VCS for first run debug
    Veloce for HW acceleration/emulation
    Fusion Compiler for PnR
    Calibre for DRC

        ----    ----    ----    ----    ----    ----    ----

    Veloce

        ----    ----    ----    ----    ----    ----    ----

Related Articles

    CDNS Palladium Z1 speed, uptime, & cloud access is Best of 2020 #5a
    CDNS Protium "dynamic duo" hooks into Palladium is Best of 2020 #5b
    ... and Big 3 vendors launched new HW in 2021 and users want scoops!
    Sneak peeks at new Palladium X2 and new Protium X2 is Best 2020 #5c

Join    Index    Next->Item







   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.






















Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)