Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS

( ESNUG 480 Item 5 ) -------------------------------------------- [03/26/09]

Subject: ( ESNUG 479 #4 ) A user's first look at Cadence C-to-Silicon

> A total of 820 engineers responded, and below is what they told us.  Since
> they could pick 2 answers, the total is ~200% rather than 100%.
>
>      "What are the 2 biggest reasons to use High Level Synthesis?"
>
>                 Faster time to RTL :  ######################## 64%
>           Faster verification time :  ################## 49%
>        Fewer engineering resources :  ############ 31%
>                         Fewer Bugs :  ####### 19%
>         RTL better than hand-coded :  ##### 14%
>          Faster ECO implementation :  ### 8%
>     Better product differentiation :  ### 7%
>                              Other :  ## 4%
>
> The results?  Almost 2/3rds (64%) recognized that high level synthesis
> will get to RTL code faster.


From: Gernot Koch <gernot.koch=user domain=micronas fought nom>

Hi John,

This is in response to ESNUG 479 #4 by Shawn McCloud.  While I agree with
his survey findings (C synthesis offers faster time to RTL and better
verification), I feel that Shawn is a bit one-sided in his claims that
ANSI C is better than SystemC.  That may be true if you constrain yourself
to modeling datapaths.  But there is no standard synthesizable subset in
ANSI C, and it lacks everything needed to express timing and protocol
needed to model/simulate more complex communication/interface schemes.

SystemC has all that, that's why it exists.  With ANSI C you have to tell
the C-synthesis tool what kind of interface you want.  (For example, Shawn's
Catapult C requires this.)  The first time you get to simulating your block
in the context of it's environment is after ANSI C synthesis.  So, if you
have ONLY simple communication schemes and pure datapath, ANSI C is fine.
If not, (i.e. a real design) SystemC is probably the better choice.  It just
doesn't make sense to partition a design to use ANSI C synthesis only on the
datapath while using a different tool (or RTL) on the rest!

At my company, we've been interested in ANSI C/SystemC synthesis for a long
time.  Over the past few years we have evaluated many of the tools.  We even
tried Synopsys SystemC Compiler in a small way, but could never adopt it
broadly since it was too immature, too unstable, had too little language
coverage, plus it didn't produce the results we expected.

That all changed last year when we found the time was right to take another
look at the major C synthesis tools.  We created a 2-dimensional peaking
algorithm video processing design and used that on the major HLS tools on
the market. 

All of the current C synthesis tools proved to be usable, with some minor
differences in language coverage, tool handling, quality of results and
peripheral features.

The synthesized portion of our benchmark consisted of a single non-
hierarchical SystemC module with one process and a number of procedure
calls, totaling to around 700 LOC, 6 SRAMs (modeled as C++ classes), and
many arithmetic operations, which we wanted to have implemented as a
pipeline with II=1 (one pixel in - one pixel out in each clock cycle). 

The whole thing is quite typical for a datapath block in our designs.  We
synthesized the benchmark for a number of design targets (130 nm, 90 nm,
Xilinx Virtex-4) and clock targets (54 MHz, 108 MHz).  

In terms of area the different clock speeds did not make a huge difference.
A higher clock speed resulted simply in more registers needed, i.e. in
greater pipeline latency.  (As expected.)  We did not have to touch our
source code in any way when moving between target technologies and clock
speeds.  The area results were in all cases more or less on par with what
we expected from a corresponding manual design.  This was no real surprise
since our spec did not leave much room for architectural exploration.

We picked Cadence CtoSilicon for further testing on a real pilot chip
because in spite of the datapath nature of our benchmark, our real pilot
contains some complex communication.  Also, we found that pure ANSI C/C++
as an input language doesn't lend itself well to modeling control logic
and interface protocols, while SystemC does.  (CtoS reads SystemC.)  Plus,
since SystemC is a superset of C++, there's no penalty for untimed datapath
modeling either.  CtoS produced good results, both in speed and area, and
also had the fastest turnaround in the competition.  Results in numbers as
reported by DC (90 nm target):

   Area results excluding SRAM (registers/comb logic):

                       C-to-Silicon               Others
            54 MHz     13414/68040       9981/86479 - 23435/93605
           108 MHz     24809/68523      19357/87021 - 48394/101775

   Shortest achievable pipeline latency (clock cycles):

                       C-to-Silicon               Others
            54 MHz          10                    8 - 13
           108 MHz          15                   12 - 19

   Timing results (slack nsec):

                       C-to-Silicon               Others
            54 MHz         0.00                0.52 - 1.14
           108 MHz         0.00                0.00 - 0.10

The slack numbers shows that the C synthesis tools all do a pretty good job
estimating the delays in the RTL implementation.  In all cases, they don't
waste a lot of potential.

The C synthesis (from C to RTL) runtimes ranged from 5 min (54 MHz, ASIC)
up to around 1.5 hours (120 MHz, Xilinx), depending on the tightness of
clock constraint.  CtoSilicon was generally on the faster end.

The DC (from RTL to gates) runtimes were less than 10 mins for all cases.

We did not check for the maximum achievable clock frequency on ASIC.  On
FPGA, the max frequency was in all cases around 120 MHz (due to the builtin
multipliers) and in all cases we could only get there by overconstraining
the C synthesis tools into accepting results with negative slack, and then
running FPGA synthesis to see what it could do.  Obviously, the timing
engines of all tools are less than optimal for the FPGA target if you're
pushing the envelope.

Where CtoSilicon was not fully up to par was in terms of coding style.  We
did originally model the line memories as separate C++ classes for easy
reuse.  While the other C synth tools accepted this coding style, CtoSilicon
wasn't able to map arrays located in sub-classes to memory.  We had to roll
the line memory into the algorithm, effectively killing off the idea of
code reuse.

Regardless, we still chose to adopt CtoSilicon, trusting that Cadence will,
in time, address such issues.

We believe SystemC tools like CtoS will become mainstream not because the
tools are better than RTL designers, but because they allow you to explore
more architectural choices in a shorter time than hand-writing RTL.

    - Gernot Koch
      Micronas GmbH                              Freiburg, Germany

Join Index Next->Item

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)