( ESNUG 485 Item 8 ) -------------------------------------------- [05/27/10]

From: Johanna Ketonen <johannak=user domain=ee.oulu.fi>
Subject: University user dumps hand-coding VHDL for Catapult C synthesis

Hi, John,

I've used Mentor's Catapult C synthesis actively for the last 3 years to
generate RTL from our C++ source code for ASICs and FPGAs.  We use the tool
for prototyping our high performance receiver algorithms for wireless 
systems, such as our 3GPP Long Term Evolution Advanced (LTE-A).
 
Our algorithms have strict latency requirements and our design sizes can be 
from 10's of thousands to a 100's of thousands gate equivalents.  The design
sizes can be larger, up to 1 million gates or so.


CATAPULT versus HAND-CODED VHDL RTL

Design 1:

Using Catapult: It took us less than 30 minutes to write and verify our C 
code.  Running it through Catapult C and experimenting with different 
architectures only took a couple of extra minutes.  

Hand-coded RTL: It only took 2 hours to write the VHDL but the scheduling 
and verification took about 3-4 more days.  It took a few iterations to get 
close to the CatC results.  The differences with the one small design are
presented in the table below.

            Lines of C code:                      20 
            Lines of Catapult output VHDL:       450 
            Lines of manual VHDL:                 60

                   Area (k GE)   Power (mW)    Throughput rate (Mbps) 
   Hand coded VHDL     13           32               100
   Catapult C          12.8         31               100

You can see our quality of results was comparable to our hand coded RTL.

Design 2:

I did not personally do Design 2, so I cannot say the exacts steps, but I 
understand it took an expert who was working with our group several 
iterations with the hand-coded design to reach the same area and throughput
as with the Catapult design.

                   Area (k GE)   Power (mW)    Throughput rate (Mbps)
   Hand coded VHDL     15.5         27               121
   Catapult C           6.9         32               121

I estimate it would take me at least one month hand-coding something that
would take a only week to implement with Catapult, and most of my week
would go to debugging the C code.  So this is about a 75% time savings.


INTERACTIVE ARCHITECTURAL EXPLORATION

We used CatapultC's incremental methodology by experimenting with different
architecturals.  After we observe the results for a certain architecture,
it was easy for us to go back and change it, and eventually select the best
architecture to meet our design goals.  After we set up the clock frequency
and technology, we specify the architecture constraints (memory mapping,
splitting, loop unrolling, pipelining).
 
We can view and modify the resources we used, such as which components are 
used for which operation.  After we set the constraints, we can view the 
schedule and see the area estimate and latency.  The final stage is to 
generate the output files, including the RTL.  If the area and latency 
estimates don't meet our requirements, we can go back to any stage and 
make changes. 

We also compare different architectures with different clock frequencies 
also in order to obtain minimal power consumption.

A simple example: a loop iterates 4 times.  The result should be available
in 10 nsec.  After pipelining the function, the latency estimate is 40 nsec
with 100 MHz clock frequency.  It's possible to go back and change the
clock frequency and/or unroll the loop and get the result in 10 nsec. 


HIERARCHY

Our general approach to hierarchy is to run separate blocks through Catapult
because some of these blocks can be quite large.  As an example, I ran one
block through Catapult that was 500 lines of C++ code with many loops in the
code.  This block had a 2-3 hour run-time with Catapult.  I divided another 
algorithm into 8 parts, with each 300 lines of C++ code.  (I do not have the 
number of lines of RTL for these designs.)

We used some hierarchy in our blocks by having more than one function in the 
C++ code, then defining one function as the top level function.  By running 
our blocks separately we can do block-level pipelining.  The blocks can 
process data at the same time, i.e., when the previous block is finished, 
its results are used in the next block which was processing data for the 
next block, etc. 


INPUT LANGUAGE

Catapult supports SystemC and C++.  I use C++.  It's quite well supported.
C based verification is much faster than HDL verification, where  we must
write the HDL testbench and debug the RTL, which is very time consuming.
For example:

           20 lines of C code may take 1 hour to verify

           Same design in VHDL RTL may take 6 hours. 
 
If there is a need to change the word lengths, we might have to change our 
entire HDL.  In C code, it can be as simple as changing the size of the 
variable. 

Additionally, Catapult offers some math functions such as division and 
square root.  These library functions support the Mentor AC data types. 
Many of our communications algorithms include division and square root 
operations, so it is nice to have ready-made functions for them.  By using 
these functions, we can unroll the operation to meet the required latency 
constraints.  We can also define the precision of the inputs and outputs of
the function.


INTERFACES WITH MENTOR PRECISION, SYNOPSYS DESIGN COMPILER 

We used CatapultC with both Synopsys Design Compiler for ASIC synthesis 
and with Mentor Precision synthesis for FPGA synthesis.

  -  Precision FPGA synthesis.  The interface between Precision and
     CatC was easy to use.  Precision can be opened directly from CatC. 
     In the simplest case, we can synthesize our design with one click
     of a button.
 
   - Design Compiler.  For ASIC logic synthesis on Catapult's RTL, we
     used DC.  In addition to the RTL output, Catapult C provides a
     synthesis script file with appropriate directives for DC that is 
     human readable and can be modified.  Catapult spits out a solution
     folder of results (e.g. RTL, verification, and synthesis folders).
     If the Design Compiler is installed in the same location as Catapult,
     we just  double click on the icon in the synthesis folder with the
     scripts - and it automatically runs logic synthesis.  (Our DC license
     is on a different server, so we copy the Catapult output files to a
     different location and run the Catapult generated script there.)


VERIFICATION

At Oulu, we may have a different approach to verification than many Catapult
users.  Here is our process: 

 1. We do performance simulations of the algorithms in Matlab.  We use C++
    code in Matlab by adding a gateway to the C function.  The gateway is
    an entry point to the MEX-file.  MEX stands for "MATLAB executable".
    A MEX file consists of a gateway routine and a computational routine.
    A C/C++  subroutine can be called from Matlab using a MEX-file after
    compiling the code.  (source: Matlab)

 2. We then add the gateway to the Catapult C code and run it in the Matlab
    simulator.  This way we can verify that the word lengths used in the 
    Catapult code are sufficient and that the C code gives correct results.
    The word lengths can be changed either by changing the integer size or
    the whole word length or both until sufficient performance is reached. 

    We use the Mentor AC_datatypes in both variable definitions and word
    length performance simulations in Matlab.  They are easy to use and
    we can easily change the word lengths, as Mentor's data type includes
    a field for integer width and the total word length.  We have used also
    the data types just for word length studies prior to using Catapult C.

 3. We only rarely simulate Catapult's RTL output, but we could verify 
    Catapult RTL using a C++ testbench and running ModelSim by clicking the
    icon in the solutions folder.  For gate level simulations, we create a
    VHDL testbench.

 4. We have occasionally verified Catapult's synthesis results after running
    the generated RTL through Design Compiler in order to obtain power 
    estimates, and we have observed that the results are correct after logic
    synthesis also.  I was able to find the area and timing estimates for
    one block. 

              Clock frequency                     100 MHz
              Area after  Catapult                174,960 sq um
              Area after Design Compiler          155,570 
              Slack after Catapult                1.6 nsec
              Slack after Design Compiler         0.3 nsec
              Total power after Primetime          31 mW

CatapultC has improved in the last 4 years.  The tool has become more 
efficient, i.e., architectural choices with strict latency requirement are 
more possible now and Catapult's run-time has shortened.  I also know of 
some changes which we do not have licenses for, for example clock gating. 
The user interface has remained fairly consistent.

It only took us 2-3 hours to set up Catapult C and it only takes us 5 
minutes for us to update to a new version.  Even though the tool is easy to 
use, it still took us some time to learn how to write the C code in an 
optimal way for the tool and to learn how to optimize our design.  I would 
say it takes months to learn well; it's a process and it varies with each 
algorithm.  The first designs I implemented with Catapult I had already 
previously implemented by hand in VHDL which helped my learning CatC.

Where Catapult needs work is in runtime.  On a large design, Catapult can
take a while to run.

I've also suggested to Mentor R&D that they should add a graphical tool
to make scheduling easier to control.

    - Johanna Ketonen
      University of Oulu                         Oulu, Finland
Join    Index    Next->Item









   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)