Intel/Samsung/Innovus is Best of 2019 #7a

( DAC'19 Item 7a ) ------------------------------------------------ [04/30/20]

Subject: Cadence Innovus PnR snags both Intel and Samsung is Best of 2019 #7a

CADENCE PnR JUMPS AHEAD: In last year's 2018 report, I discussed Cadence's
lead over Synopsys in PnR.  (See DAC'18 #4b)  In that report, I took a
table of the Top 20 semi guys ranking and I just XXXX-ed out who did NOT use
Innovus.  (At the time only 5 out of the Top 20 who did not use Innovus.)

This (2019) year that same XXXX-ed out of Innovus NOT users is:

That is, compared to 2018, Anirudh picked up the two biggest semi players
on the planet.  Samsung was talked about in earlier CDNS 2019 earnings
calls -- they're no secret -- and my Wall St. analyst buddies told me

that when Lip-Bu Tan gushed about his "breakthrough wide ranging win with
a marquee U.S. semiconductor company" where CDNS had to hire a boatload
of new FAE's just to service this "market-shaping customer" in his 4Q2019
earnings call-- Lip-Bu was clearly talking about CDNS picking up Intel as
a new Innovus customer.

This news was an "oh, shit!" moment for Aart.  And I guess Aart making
Sassine Ghazi, his former SNPS VP of North American Sales (and Intel
whisperer) to be head of all SNPS PnR R&D in the end *actually* did NOT
woo Intel enough for Intel to remain 100% in the ICC/ICC2/Fusion camp.

The funny thing is 12 months ago I was publically *mocking* Anirudh on this.

    "Last I looked, Intel, which makes up 11% of all SNPS revenues is
     still very much an ICC2 house (partially due to Sassine as VP of
     Sales leadership) along with Samsung being a mostly ICC2 house.
     (Sorry, Anirudh.)"

         - John Cooley, DAC'18 #4b, (DeepChip 01/23/2019)

Now that's all flipped.  Anirudh has clearly picked up *both* Samsung *and*
Intel as Innovus customers.  And now John Cooley has to eat crow ...  :(

        ----    ----    ----    ----    ----    ----    ----

WHY LIP-BU SMILES: Ballpark numbers, the total available PnR market is $650
million.  Intel is 8% to 10% -- which, to be conservative, is $52 million on
the lower end.  Samsung is ~60% Intel, so they're $32 million.  Making these
two together as ~$85 million per year.

Since Lip-Bu just snagged both these two Big Boys, he's maybe getting 1/3rd
the first year of 3 year deals -- making ~$25 million-per-year now -- and a
later possible growth to ~$85 million-per-year.  And ...

    "Why is the PnR market so important, Cooley?  For every $1.00
     spent on digital PnR, the customer buys an *extra* $1.25 to
     $1.75 in sales of related PnR SW from their PnR provider."

         - Gary Smith, GSEDA, around 2012

... making a total possible boost of $85 million plus $1.75 x $85 million
equals an *additional* $235 million-per-year 3 years from now for Lip-Bu!
(Keep in mind these are extra gravy dollars on top of his already beefy
existing Innovus sales to his not-Intel and not-Samsung customer base.)

This is why Lip-Bu smiles.

        ----    ----    ----    ----    ----    ----    ----

COOLEY CAVEAT: but snagging Intel & Samsung will NOT be a church picnic for
Anirudh by any means.  From group to group inside those two, there will be
constant hand-to-hand combat benchmarks against ICC2/Fusion every quarter.

Having a seat at the Intel/Samsung tables only gives Anirudh a first shot
in those benchmarks -- if CDNS PnR R&D drops the ball at any time -- on a
design project by design project basis, these two Big Users will NOT
hesitate to give their PnR $$$ right back to Aart instead.

        ----    ----    ----    ----    ----    ----    ----

BENCHMARKS VS. ICC2 & FUSION: On the tech side, the user Innovus benchmarks
against Aart's PnR offerings are what caught my eye first.

   "We currently get better results from Cadence Innovus versus using
    Synopsys Fusion Compiler.  CDNS's results are typically 8-10% better
    than SNPS in terms of timing, and power, with about the same area."
    
   "Innovus is significantly faster than ICC2.  It was ~70% faster for
    medium/large blocks (~2M instances) and ~50% faster for larger
    blocks (~3M to 4M instances)."

   "Cadence also gave us better PPA than ICC2.  One metric we saw was
    a 20-30% improvement in total negative slack of Innovus over ICC2."

GREAT EXPECTATIONS FOR ML/PPA: Multiple customers were enthused about Cadence
touting using machine learning to make the next move ahead in PPA.  Mind you
it's too early for kudos, as only one user had actually tested it.

   "I ran a test case on it [Innovus ML] and we saw some promising data
    points on its potential impact.  I can tell there has been a lot of
    investment, but this ML stuff is still early."

   "I'm hoping that Cadence's ML/AI enhancement will accelerate our project
    cycle.  It's like giving your dirty laundry many rinses; then one more
    clean and final rinse.  Having AI knowledge from the previous PnR runs
    lets you know what's worth saving and spending the time on."

   "We have seen data from Cadence R&D suggesting that the ML training
    of algorithms can lead to a 3% to 5% overall PPA advantage -- very
    significant and essentially "free" as far as user effort goes."

MIXED STD CELL & MACRO PLACEMENT New this year were these two aspects of
Innovus not mentioned by earlier users.

   "We deployed Cadence's newer features for mixed standard cell and
    macro placement.  We use it to seed placement and are happy with it.
    (Note it is not 100% complete -- we must still tweak it.)"

   "I've tested Innovus' new mixed standard cell and macro placement.
    The automation speeds up the TAT -- the time saved from not having to
    manually place the macros is a big benefit.  There is room to improve.
    We need to tune the macro placement in several iterations to get
    the floorplan."

        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----
        ----    ----    ----    ----    ----    ----    ----

      QUESTION ASKED:

        Q: "What were the 3 or 4 most INTERESTING specific EDA tools
            you've seen this year?  WHY did they interest you?"

        ----    ----    ----    ----    ----    ----    ----

    We continue to use Cadence Innovus, & recently ran it on a 7 nm design.
    It's improved even more since last year.

        - We are running even larger designs through it, with more
          complex rules and more clocking.

        - We had exceptional experience with both its PPA and its
          runtime, which is scaling well with our larger designs.

        - It is productive out of the box and converges quickly.

    Overall, we've seen a continual increase in Innovus' competitiveness
    against Synopsys ICC/ICC2 -- especially at and below 7nm.

    I've heard some of our customers with dual PnR tools say this also.

    Innovus is significantly faster than ICC2.

        - ~70% faster for medium/large blocks (~2M instances)
        - ~50% faster for larger blocks (~3M to 4M instances)

    We also did a couple of smaller, mixed-signal design projects with the 
    Innovus/Virtuoso integration and it worked extremely well.

    I've heard about Cadence using machine learning to do even better PPA. 
    Our group hasn't used it yet but we're looking forward to seeing more
    on this AI work.

        ----    ----    ----    ----    ----    ----    ----

    Cadence Innovus

    We used both Innovus and ICC2 PnR to physically implement our design
    at a 5 nm.  Innovus got us better PPA and turnaround time vs. SNPS.

    CADENCE INNOVUS VS SYNOPSYS ICC2 

    Turnaround Time:

    We use Cadence Innovus' multi-threading to run it on 16 cores.  Its
    turnaround time is 2X better than ICC2 running on the same hardware.

    Capacity & Performance:

    Cadence can do larger blocks while Synopsys struggles with them.
    For example, for a full flow run for 3 million placeable instances

        - Cadence Innovus took 3.5 days
        - Synopsys ICC2 took 6-7 days -- nearly 2X longer

    Scalability: Cadence doesn't scale all that well

    As for scalability, we measured Innovus' threading efficiency for the 
    different design stages.  We found it varied at different stages.  For
    example, speed improvement for clock tree synthesis might be in the
    2.5x-3X range overall -- running Innovus on 8 cores gave us 6x speed-up
    for some stages, while for other stages we only get 1.2x.  We'd like
    Cadence to continue to invest in improving steps that don't scale as
    well, as there is lots more upside beyond 2x
    we already get from multi-threading.

    PPA: Cadence requires less ECO time to meet specs

    Cadence also gave us better PPA than ICC2.  One metric here is we saw
    a 20-30% improvement in total negative slack of Innovus over ICC2.

    We always push our design requirements beyond what our P&R tools can 
    deliver manually,  so a big part of our turnaround time boils down to 
    amount of manual effort left.

    So, Cadence's better result can mean the difference in us closing within
    our deadline or not -- we spend less time in ECO mode.

    MACHINE LEARNING PPA ENHANCEMENT

    Our physical design team takes "dirty RTL" and starts to ingest it.  
    (If we waited until the RTL was completely done, we would not meet our
    schedule.)  

    So, we have a feasibility flow, and can have 1000s of bugs to be flushed
    out in our first implementation.  It's also challenging as early on 
    we're missing pins, and pieces of logic that we try to work around.

    We want to try Cadence's ML PPA enhancements on our next project.  (We
    didn't hear about it in time for our recent design project).  From what
    we've seen, their ML can:

        - make 100s of runs on the same block -- from initial RTL drop, 
          to constraints, to final GDSII.

        - but each of our runs is unaware of prior runs.  This needs to
          change.  

    For example, if multiple PnR runs on a particular block keeps reporting
    11,000 DRC violations, we like knowing that early so we can redesign
    that block.  Their ML will do that for us.

    Our ML design flow is about early warnings. 

       - It lets us give early feedback to our designers on how PnR
         feasible their blocks are in the shortest amount of time.  

       - The faster, we can do these physical iterations, the sooner
         our designers can lock their RTL and go into ECO mode -- as
         our changes are limited after that.

    I'm hoping that Cadence's ML/AI enhancement will accelerate our project
    cycle.  It's like giving your dirty laundry many rinses; then one more
    clean and final rinse.  Having AI knowledge from the previous PnR runs
    lets you know what's worth saving and spending the time on.

    Innovus/Tempus integration

    Having Innovus tight with Tempus for a final timing clean-up is crucial.

    Our standard Innovus flow used GBA (graph-based analysis), as it is less
    expensive timewise to do GBA than PBA (path-based analysis).  So Innovus
    has a final timing clean up run with Tempus using PBA that is more 
    accurate / less pessimistic than GBA.

    We used the Innovus + Tempus "final timing clean-up" for

    - Hold fixing  (Cadence shines here; best in industry, hands down.)
    - Downsizing cells
    - VT swapping

    We were able to reduce our final power consumption, and the Tempus
    timing results correlated sufficiently with PrimeTime for sign-off.
    That is, our Innovus/Tempus final clean-up results held.

    GigaPlace & GigaOpt

    Global placement sets the stage for following optimizations, while the
    rest is incremental.  So, this is the lynchpin of any PNR flow, because
    if the placer gets it wrong, the rest of the implementation won't work.
    We've never had a block where Innovus' global placer messed up.
    Cadence should be proud of that.  :)

    CDNS Flow Tool

    Cadence Innovus also has a flow wrapper tool that integrates all CDNS 
    tools together (Genus/Innovus/Tempus/Voltus) under one flow umbrella.
    You can use the flow tool quickly create your starting flow, and then
    customize it.

    It's extensible with new flow steps you create (i.e. TcL functions you
    add to your flow), so it grows with you.  For example, you can define:

        - non-default routing rules such as triple spacing on clock net
        - custom reports

    Cadence has this tool for a while, but it really came into its own this
    year with Cadence's 19.1 release.


    Compared to its competition, from what we've seen Anirudh's combination
    of Genus/Innovus/Tempus/Voltus is stable and deployable.   And from what
    I've heard from colleagues and around the industry, CDNS continues to
    gain ground against Synopsys in PnR.

    We like Anirudh's corporate push to make ML/AI happen; plus his unified
    data model, common UI, and metrics infrastructure across all CDNS tools.

        ----    ----    ----    ----    ----    ----    ----

    We've been using Innovus for the bulk of our blocks at 7nm and below.

    We are seeing runtime improvements with Innovus of about a day on blocks
    that require 3 to 4 days to run, as compared with the "other flow" we
    have in house.  (We run both on the same hardware)

    We ran a variety of test cases, and Innovus achieved PPA results which 
    were better than or as good by all criteria vs. the "other flow".
 
    Innovus also now has two new noteworthy features, though I haven't 
    tested them yet.

    1. A machine learning PPA enhancement.  

         - We have seen data from Cadence R&D suggesting that the 
           training of algorithms can lead to a 3% to 5% overall
           PPA advantage -- very significant and essentially "free"
           as far as user effort goes.

         - We don't see any other companies making such a feature
           claim.

    2. Automated mixed std cell and macro placement -- the best possible 
       integration here will be critical at nodes 7nm on down for highest 
       accuracy and best optimizations.

    Innovus also has an integration with Virtuoso.  We haven't tried it
    yet, but because of the nature of our designs we are excited about the
    potential optimizations this will enable.

    A design flow must be complete to be used at 7nm or lower.  So far we
    have consistently gotten better runtime, better PPA, and better MMMC
    PnR results from Innovus than from the "other flow".

    Innovus is doing well with our company.  It's definitely gaining usage
    here.  With Innovus giving us results better than or equal to "other
    flows" by ALL criteria, it's easy to see why.

        ----    ----    ----    ----    ----    ----    ----

    We've used Cadence Innovus for several years.  Cadence has some smart 
    people who have moved technology ahead.  Anirudh is a big factor, plus
    Paul Cunningham.  

    We recently ran Innovus on a 2+ million instance complex IP.  

    Our total turnaround time to go from RTL to GDSII with Cadence (Genus,
    Innovus, Tempus and Voltus) was just under 5 working days.  

    This included a complete multi-mode, multi-corner (MMMC) run from
    synthesis through PnR and timing sign off, as Genus is now MMMC-aware
    also.

    Cadence at 7nm and 5nm

    We use Cadence for design implementation at 7nm and at 5nm.  For 7nm
    we get good results right out of the box -- Innovus did not need much
    tuning.

    Innovus can handle 5nm; however, it needs tuning to the technology.

    So, we've been prototyping first as we iron out all the information for 
    the new node.

    Cadence Stylus Flow

    We use the Cadence Stylus flow, which covers Cadence's entire digital
    implementation flow.  

        - When we get new RTL code, we run it through the Cadence 
          implementation flow to ensure our flow is correct and to do 
          some pipe cleaning.  We put in multiple libraries and get the
          timing reports for the multiple corners. 

        - Then, we run it with for fairly 'getable' performance with
          express synthesis and an express prototyping run in Innovus.
          We then restart from a particular step or from the beginning,
          depending on what is needed.

        - It takes us approximately 5 working days to do this express
          prototyping.  After we get our constraints correct and our RTL
          configured, we run it for even tighter performance.

          Stylus has an HTML dashboard that show routing density, DRCs,
          violating paths, even a pictorial representation of the top 
          violators.  We can identify long routes through the HTML.

    ECO Changes

    We use Conformal ECO mode along with Conformal LEC, Voltus and Tempus.

    If it's late in the design and we get an ECO change, we have 2 options:

        - For a complex ECO with high gate count change, we wouldn't 
          want to hand-change it, so we run the ECO change through a
          Conformal ECO flow.  This gives fast turnaround.

        - For a 15-20 gate change, it's faster to do it manually in a 
          few hours without the overhead of Conformal ECO.

    Performance & Scaling

    We can get a >2 million instance block done in less than 5 days from
    RTL to GDS, using multiple cores and threading with decent memory and
    processor speed.

    Cadence's scaling is fairly linear.  (We once had the misfortune of 
    running a similar-sized block on only 4 CPUs -- and killed it after 
    a week.)

    PPA

    We are designers, so we are neutral about tools.  We just want the best
    result and would drop one tool if another one worked better for a 
    particular design or process node.  

    In terms of PPA, we currently get better results from Cadence Innovus
    versus using Synopsys Fusion Compiler.  (We've worked with Fusion
    Compiler over last several months.)

    Cadence's results are typically 8-10% better than Synopsys in terms of
    timing, and power, with about the same area.

    GigaPlace, GigaOpt, and CCopt

    Some history here: Cadence Innovus wasn't doing so well back in 2013. 
    The addition of concurrent clock and data path optimization gave Cadence
    clocking an edge.

    Cadence was so ahead of its time with CCopt, that we would use Synopsys
    ICC2 for placement, run Cadence Innovus CCopt and then go back to SNPS
    for routing.  (Since then, Synopsys has its own concurrent clock/data 
    optimization feature.) 

    Innovus typically optimizes WNS and TNS

    This was a fundamental game changer on how we used the tool.  Innovus no
    longer depended on us telling it which groups to attack.  It operates on
    all paths and is a good global solution.  

    This has a big impact on QoR.

    PPA ML Enhancements

    Cadence says they've added some PPA enhancements to the Innovus flow 
    that use machine learning.  We haven't used it, but are interested
    to see if Cadence will a good job there.

    Cadence is getting a lot of things right with current toolset.  Their 
    machine learning enhancements will only make it better.

        ----    ----    ----    ----    ----    ----    ----

    Cadence Innovus

    We've now used Innovus for multiple designs at 16nm.

    Anirudh has made substantial turnaround time improvements with Innovus
    over the past 2 years.

    For a 3-5 M instance block, we can close RTL-to-GDS signoff in a week.  
    We have run it on up to 16 CPUs -- it scales well.

    Machine Learning for PPA optimization

    Cadence makes non-ML improvements with PPA for some years now; however,
    the majority of the high impact gains seem to have already been wrung
    out -- there is less low hanging PPA fruit.  Cadence then releases a
    new PPA optimization using "machine learning". 

    I ran a test case on it, and we saw some promising data points on its 
    potential impact.  I can tell there has been a lot of investment, but 
    this ML stuff is still early.

    Mixed standard cell & macro placement

    We have deployed Cadence's newer features for mixed standard cell and
    macro placement.  We use it to seed placement and are happy with it.
    (Note it is not 100% complete -- we must still tweak it.)

    Anything we can do to reduce cycle time is good, as it helps us deal 
    with our backlog.  Designers appreciate it, as we are overloaded and 
    can't find enough engineers. 
 
        ----    ----    ----    ----    ----    ----    ----

    Cadence Innovus APR has good turnaround time.  The speed is excellent 
    with the multi-threading capability.

    Our TAT for running it on 8 CPUs:

        - 1M instances             18-24 hours
        - 2M - 3M instances       ~ 2-3  days

    It also scales well.  My experience is going from 4 CPUs to 8 CPUs, it
    that it pretty much doubled the TAT speed.

    I've also gotten good PPA results for my blocks and reduced the 
    iterations I needed for performance optimization.  Its GigaPlace and 
    GigaOpt have very strong features for placement and optimization.  
    Plus, Innovus has been good at nodes 16 nm and even 7nm.

    Good Online Support

    One noteworthy aspect about Innovus is that it has good online support.
    I can search for tool use suggestions on their website, and the search
    bar will cross all the relevant user manuals, adoption kits, and 
    application notes available online.  It's good for the commands, and
    for Q&A.  I benefit a lot from it -- it improves our efficiency.

    Mixed standard cell & macro placement

    I've tested Innovus' new mixed standard cell and macro placement 
    capability on a couple of non-production blocks.  I didn't compare to a
    product design to compare the results, but it looked promising.

    The automation speeds up the TAT -- the time saved from not having to
    manually place the macros is a big benefit.

    There is room to improve.  We need to tune the macro placement in
    several iterations to get the floorplan.

    We need a better floorplan with reasonable placement.  The mixed placer
    can provide the benefit to at least get a starting floorplan.  I don't
    know yet how much manual effort would be needed to tune it.

    Even now, it can give us multiple options to start with, which cuts the
    manual effort needed.

    Machine learning PPA optimization

    It's ML stuff that's supposed to learn from your previous designs to
    automatically improve your PnR PPA.  I haven't used it yet, but I'm
    curious about the potential to reduce the human effort it takes us to
    finish our chips.  It could be significantly helpful.

    This will be good for both EDA and chip design companies -- there are
    enough designs for machine learning technology to learn from and make 
    great impact on future designs.

    My sense is that Innovus is doing very well in the market and continues
    to grow.  It handles with millions of instances and is very fast from
    strong multi-threading.  

        ----    ----    ----    ----    ----    ----    ----

    We use Cadence Innovus.  Below is my update on Innovus to last year's 
    report.

    For context, our designs are smaller, but high performance.  So, we are 
    very demanding on PPA.

        1. Innovus has new time-driven algorithm for placement.  
           We've notice higher QoR, with no penalty on runtime

        2. Cadence now has a Machine Learning PPA enhancement.  

    We have not had time to try it yet, but it makes sense that the tool can
    learn from scoring your QoR.  It's a logical progression; one more tool
    in the toolbox.

    I'd recommend Innovus.

        ----    ----    ----    ----    ----    ----    ----

    We use Cadence Innovus for place and route.  

    We initially went with Innovus over Synopsys ICC2 because Cadence made 
    it easier to work with us.

    Then recently, we had new engineers have come on board at our company
    who were long-time ICC2 users and they had never used Innovus.

    - The new engineers commented that Innovus is a lot easier to use 
      than ICC2 

    - They came up to speed quickly and were fully productive within 
      a month.

    As for PPA, it took us only 2 months using Innovus (plus Tempus and
    Voltus) to close a 90K+ instance, 7 nm block at 3 GHz.  RTL to GDS2.
    We did it using the tools and writing TcL scripts to iterate -- i.e.
    we did't have to do any manual design changes.  This 2 months includes
    final DRC sign-off with Calibre, too.

        ----    ----    ----    ----    ----    ----    ----

    Watching the industry shift of ICC2 over to Innovus has been fun.

    It'll be interesting to see if Shankar [Krishnamoorthy] can get
    some of those PnR customers back with Fusion Compiler.

        ----    ----    ----    ----    ----    ----    ----

    I think our guys use ICC2.

    It's mostly OK now from what I hear from them.

    They're not bitterly complaining about ICC2 like they did years ago.

        ----    ----    ----    ----    ----    ----    ----

    Innovus + Calibre == YES

    Innovus + Pegasus == HELL, NO!

        ----    ----    ----    ----    ----    ----    ----

Related Articles

    Cadence Innovus dominates Synopsys ICC/ICC2 is #4b "Best of 2018"
    User benchmarks DC-ICC2 vs Fusion Compiler vs Genus-Innovus flows
    Costello on SNPS PnR "still in catch up mode" in 2 years from now
    Synopsys layoffs means ICC2 rewrite is unknown for 3 to 4 years out
    Cadence Innovus dominates Synopsys ICC/ICC2 is #10 "Best of 2017"
    Engineering comments point to SNPS vs. CDNS PNR shakeout at Apple
    ICC2 patch rev, Innovus penetration, and the 10nm layout problem

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)