( ESNUG 492 Item 1 ) -------------------------------------------- [05/26/11]

Subject: 8 user tech evals of CDNS RTL Compiler vs. SNPS Design Compiler

> Are there any unbiased comparisons between the area and power performance
> of Synopsys Design Compiler and Cadence Encounter RTL Compiler synthesis?
>
>     - Tom Harris
>       Starkey Labs, Inc.                         Eden Prairie, MN

 [ Editor's Note: Each of these engineers asked to be anonymous for fear
   of retribution from Synopsys (and in one case Cadence).  - John ]

         ----    ----    ----    ----    ----    ----   ----

From: [ Please Don't Hurt Me! ]

Hi John,

I have about 15 years in ASIC and am currently working at [company deleted]
and wanted to share some of my experience.  Anonymous please.

I have some experience with both SNPS Design Compiler and CDNS RTL Compiler.

About 8 years ago (in one of my previous companies), we were designing a
DDR2 memory controller and had issues with timing closure using Design
Compiler.  RTL Compiler was just coming up and we gave it a try.

The improvement in timing was significant (~20%) and the area was smaller
(don't remember the exact number).  Our IP was taped out using RTL Compiler
in a North Bridge and it came back successful.

In my current division (high speed SERDES group) in [company deleted],
we've been using RTL Compiler for years.  For our PHYs we need to run with
fast clocks for some synthesized blocks (up to 1.5 GHz) as well as limited
real estate.  With RC we've been able to hit those goals.

Tool aside, Cadence Support has been very timely and to the point.  I got
quick responses even during X'mas holiday period when they had a shut down!

    - [ Please Don't Hurt Me! ]

         ----    ----    ----    ----    ----    ----   ----

From: [ The Invisible Man ]

Hi, John,

We ran several modules, 100 K instances and 500 K instances, and found that
DC and RC results correlated very well for the same type of setup.

RC had 5% smaller area and 5% less leakage/dynamic power for 65 nm process.

Please keep me anon.

    - [ The Invisible Man ]

         ----    ----    ----    ----    ----    ----   ----

From: [ Don't Taze Me, Man! ]

Hi, John,

My name is [name deleted] and I am a logic designer working for [company
deleted].  I'd like to stay anonymous since I want to keep my friends at
both Cadence and Synopsys.

Back in October 2010 I did a comparison between RTL Compiler and Synopsys
DC in topological mode.  Here is the report that I wrote up back then.

Small Block - 20 K instances, Verilog, TSMS 65 nm
===========

Many tests runs on this block.  This was our primary test vehicle.

  - DC gave a 120 ps violation at our 500 Mhz target.  RC gives us results
    in the range 12 ps to 82 ps, depending upon leakage setting knobs.

  - DC gave us a 0 ps violation at a frequency of 450 MHz, as did RC.

  - DC gave us a power consumption of 11.5 mW at 500 MHz, while RC gave
    us results in the range 13.25 mW.

  - DC gave us a power consumption of 10.3 mW at 450 MHz, while RC gave
    us 12.1 mW

  - At 500 MHz, DC gave us a cell area of 249,322 um2 while RC gave us
    a cell area of 236,666 to 252,670 um2.

Large Block - 1 M instances, Verilog, TSMS 65 nm
===========

This being a large block, we could only manage a single run.  The DC script
used had two parts, the hierarchical synthesis, and a completely flattened
synthesis.  The hierarchical section finished reporting after about
34 hours.  The flattening run took a further 24 hours to finish.

  - DC gave us a 330 ps violation at our target of 500 Mhz.  RC gives us a
   189 ps violation.

  - DC gave us a power consumption of 299 mW, while RC gave us a power
    consumption of 340 mW.

  - DC gave us a cell area of 5,962,026 um2 while RC gave us a cell area
    of 5,958,973 um2.

Conclusions
===========

Both tools didn't meet the timing requirements.  DC provided better power
numbers at the expense of worse timing.  The cell area was a wash between
the DC and RC.

With the utilization factor set to 0.66 (which is equivalent to having the
net area 0.5 x the cell area) DC did not route.  However, this may be a
more realistic result, or may not.  RC had a net area overhead of 3,415,217
in a cell area of 5,958,973, which is an overhead of 57% in the large block;
slightly more than was allocated to DC.  However, these results were taken
from results in which RC did a far better job of achieving the timing goals,
with a violation of 189 ps, versus 330 ps with DC.

Because of this eval, we decided to stay with RTL Compiler.

    - [ Don't Taze Me, Man! ]

         ----    ----    ----    ----    ----    ----   ----

From: [ It Wasn't Me! ]

Hi, John,

I prefer to be anonymous for this critique on Cadence RTL Compiler tool.

I used RTL Compiler for one of our low power project where we had multiple
power domains.  It was about 250 K gates with clock gating structures and
DFT compliance.

We defined the CPF strategy to work with RTL Compiler, used FE as a PnR
tool, and Conformal Low Power as formal verification tool.  For ATPG, we
used Fastscan and STA we used Synopsys Primetime.

The project went quite smooth.

I would definitely use RTL Compiler and CPF for our next low power design.

    - [ It Wasn't Me! ]

         ----    ----    ----    ----    ----    ----   ----

From: [ I Did Not Have Sex With That Woman ]

Hi, John,

I have about 10 years of experience in ASIC design, and although I don't
have a great comparison between DC and RC, I thought I'd tell you about
my experience during the process of switching from DC to RC.

Up until 3 years ago I was using DC for synthesis.  Initially I attended
a Synopsys training course, so I got to know synthesis through DC's
viewpoint.

In those times we didn't care much for power consumption, and DC being a
synonym for synthesis, we took whatever timing resulted at the end of
the run.  Also, we were running bottom-up synthesis, with system-wide
specified inter-module input and output delays.

We may have over-constrained some of our paths in the chip -- nobody ever
analyzed the issue.

At one point (about 3 years now), mostly because of the poor customer
support we received, my company switched to RC.

My first surprise was the disciplined, organized approach RC gently
enforced by providing a generic synthesis template script.  That included:

   - scan insertion
   - power targets
   - different operational modes
   - CPF support

While all this was a bit strange at first and required more work than I
expected, once all the required files were in place, everything started
working nicely.

Reports were abundant and well organized.  Results were easy to access,
and with the suggested top-down approach the guesswork of inter-module
communication/timing was removed.

The entire RC synthesis process became a lot more formal.  Power domain
definition and ISO/PSO cell insertion came with the flow, and so did
scan definition.

I, alone, synthesized the chip in RC and released it to our back-end team.

I could not have done it completely alone, but I got terrific support from
Cadence.  This chip, a 100 MHz memory controller with two host interfaces,
7 power domains and a DDR in the back taped-out and worked well.  (Yes,
there were some small bugs, but the chip came up on the first try!)

Thanks to CPF and the power-aware simulations, our power domains worked,
too!  Our real silicon power usage numbers were within ~20% of the numbers
reported during synthesis.

The Cadence tech support was great ever since.  It was available even after
tapeout, during ECO implementation, and they were especially helpful when
sorting out logic equivalence checking issues.

Switching from DC to RC was like going from night into day for us.

If my comments are to be published, I would want to stay anonymous.

    - [ I Did Not Have Sex With That Woman ]

         ----    ----    ----    ----    ----    ----   ----

From: [ We Shall Overcome ]

Hi, John,

Not long ago we switched to RC after the expiration of our DC licenses.

Before the switch we picked a previously taped-out block to compare DC
and RC.  We took both netlists through Encounter up to the post-route
stage and compared utilization and timing slack.

RC netlist's utilization was 6.14% smaller after P&R.  WNS was also 52 ps
better in RC.  Even though the improvement was not double digits, we
still decided to switch to RC considering that our layout tool was already
Cadence Encounter and the support we get as a small company user from the
Cadence team was excellent.

Please make my identity anonymous if my comments are to be published.

    - [ We Shall Overcome ]

         ----    ----    ----    ----    ----    ----   ----

From: [ Fight The Power ]

Hi John,

Please keep me anon.

I used DC in a "power centric" flow for several years for a very low power
medical implant chip.  At the company I am at now we chose RC over DC for
a similar type of chip.

Both tools do power and area optimization.  

DC has a few more constraint options (set_max_area) for area optimization
over RC, but from what I have seen RC will get similar results using
different synthesis strategies and attribute settings.

In both tools you have to be careful with library cell selection and
wireload strategy -- these will have a pronounced affect on area.  A lot
of area optimization can be done during PNR when you have the real physical
loading numbers.

When benchmarked against each other, we saw almost no difference in area.
But with both tools we had to do a fair amount of trial and error with the
synthesis scripts to get what we thought were optimal area results.

As for power optimization, both tools will infer clock gating and operand
isolation almost identically.  What I have seen is that you may need to
tweak your scripts and/or Verilog RTL to ensure proper inference with
either tool.  Both produce similar results.  I have not tried power gating
or leakage optimization with either tool.

We've seen very good correlation between power numbers between SPICE and RC.
DC has a couple more options for performing power analysis without back-
annotating switching activity, but I would back-annotate switching activity
as much as possible for the best results.

In general DC is more mature, but I think RC can be made to work just as
well in a power/area centric flow.

    - [ Fight The Power ]

         ----    ----    ----    ----    ----    ----   ----

> It's unfair.  Oasys RealTime uses algorithms and data structures that were
> not even around back in 1986 -- that's why it's 30X to 60X faster.
>
> While DC is amazing in all of the crazy corner cases it can synthesize, it
> is simply too big and too complex and too old to speed up.
>
> That's why Synopsys Corporate Marketing had to come up with DC Explorer;
> it's a 4X speed-up pre-processor for the generic DC donkey cart runs to
> help overcome those 60X Ferrari runs Oasys RealTime intrinically has.
>
> Or look at it another way -- in its entire 25 year history, SNPS has never
> once bothered with an RTL estimator tool for DC until now.  What changed?
>
>     - from http://www.deepchip.com/gadfly/gad042811.html


From: [ I Am Sparticus ]

Hi John,

I was at the San Jose SNUG 2011 conference and attended the DC Explorer
luncheon.

I felt that the DC Explorer was just a less accurate version of DC that
results in shorter runtime.  I was not impressed.  Any idiot can run DC
with easy constraints to get fast estimated numbers.  I don't need a
"new" tool to do that.

Thanks for the review of the OASYS product.  It seems to be quite an
interesting competitor.

I'm a user of Cadence RC and found that it's quite comparable to DC in
timing, area, power, and runtimes.  Our local Cadence AE keeps pushing
for their own CDNS methodology but our backend group using Magma prefers
zero-wireload and putting in extra timing margin to account for routing
and buffers.  It simplifies the flow and allows Talus Vortex to do the
optimizations instead of letting DC run for hours and hours to do it.

What are the different synthesis setup / methodologies for current and
future process nodes that you have seen?

Our backend group suggests something like this for zero wireload:

90 nm - 25% clock timing margin (10 ns PT signoff would be 7.5 ns DC SDC)
65 nm - 35% clock timing margin (10 ns PT signoff would be 6.5 ns DC SDC)

due to increased wireloading effects in smaller geometries.

Do you think 45 nm should have 50% clock margin?

Anon please.

    - [ I Am Sparticus ]
Join    Index    Next->Item






   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)