( ESNUG 485 Item 8 ) -------------------------------------------- [05/27/10]
From: Johanna Ketonen <johannak=user domain=ee.oulu.fi>
Subject: University user dumps hand-coding VHDL for Catapult C synthesis
Hi, John,
I've used Mentor's Catapult C synthesis actively for the last 3 years to
generate RTL from our C++ source code for ASICs and FPGAs. We use the tool
for prototyping our high performance receiver algorithms for wireless
systems, such as our 3GPP Long Term Evolution Advanced (LTE-A).
Our algorithms have strict latency requirements and our design sizes can be
from 10's of thousands to a 100's of thousands gate equivalents. The design
sizes can be larger, up to 1 million gates or so.
CATAPULT versus HAND-CODED VHDL RTL
Design 1:
Using Catapult: It took us less than 30 minutes to write and verify our C
code. Running it through Catapult C and experimenting with different
architectures only took a couple of extra minutes.
Hand-coded RTL: It only took 2 hours to write the VHDL but the scheduling
and verification took about 3-4 more days. It took a few iterations to get
close to the CatC results. The differences with the one small design are
presented in the table below.
Lines of C code: 20
Lines of Catapult output VHDL: 450
Lines of manual VHDL: 60
Area (k GE) Power (mW) Throughput rate (Mbps)
Hand coded VHDL 13 32 100
Catapult C 12.8 31 100
You can see our quality of results was comparable to our hand coded RTL.
Design 2:
I did not personally do Design 2, so I cannot say the exacts steps, but I
understand it took an expert who was working with our group several
iterations with the hand-coded design to reach the same area and throughput
as with the Catapult design.
Area (k GE) Power (mW) Throughput rate (Mbps)
Hand coded VHDL 15.5 27 121
Catapult C 6.9 32 121
I estimate it would take me at least one month hand-coding something that
would take a only week to implement with Catapult, and most of my week
would go to debugging the C code. So this is about a 75% time savings.
INTERACTIVE ARCHITECTURAL EXPLORATION
We used CatapultC's incremental methodology by experimenting with different
architecturals. After we observe the results for a certain architecture,
it was easy for us to go back and change it, and eventually select the best
architecture to meet our design goals. After we set up the clock frequency
and technology, we specify the architecture constraints (memory mapping,
splitting, loop unrolling, pipelining).
We can view and modify the resources we used, such as which components are
used for which operation. After we set the constraints, we can view the
schedule and see the area estimate and latency. The final stage is to
generate the output files, including the RTL. If the area and latency
estimates don't meet our requirements, we can go back to any stage and
make changes.
We also compare different architectures with different clock frequencies
also in order to obtain minimal power consumption.
A simple example: a loop iterates 4 times. The result should be available
in 10 nsec. After pipelining the function, the latency estimate is 40 nsec
with 100 MHz clock frequency. It's possible to go back and change the
clock frequency and/or unroll the loop and get the result in 10 nsec.
HIERARCHY
Our general approach to hierarchy is to run separate blocks through Catapult
because some of these blocks can be quite large. As an example, I ran one
block through Catapult that was 500 lines of C++ code with many loops in the
code. This block had a 2-3 hour run-time with Catapult. I divided another
algorithm into 8 parts, with each 300 lines of C++ code. (I do not have the
number of lines of RTL for these designs.)
We used some hierarchy in our blocks by having more than one function in the
C++ code, then defining one function as the top level function. By running
our blocks separately we can do block-level pipelining. The blocks can
process data at the same time, i.e., when the previous block is finished,
its results are used in the next block which was processing data for the
next block, etc.
INPUT LANGUAGE
Catapult supports SystemC and C++. I use C++. It's quite well supported.
C based verification is much faster than HDL verification, where we must
write the HDL testbench and debug the RTL, which is very time consuming.
For example:
20 lines of C code may take 1 hour to verify
Same design in VHDL RTL may take 6 hours.
If there is a need to change the word lengths, we might have to change our
entire HDL. In C code, it can be as simple as changing the size of the
variable.
Additionally, Catapult offers some math functions such as division and
square root. These library functions support the Mentor AC data types.
Many of our communications algorithms include division and square root
operations, so it is nice to have ready-made functions for them. By using
these functions, we can unroll the operation to meet the required latency
constraints. We can also define the precision of the inputs and outputs of
the function.
INTERFACES WITH MENTOR PRECISION, SYNOPSYS DESIGN COMPILER
We used CatapultC with both Synopsys Design Compiler for ASIC synthesis
and with Mentor Precision synthesis for FPGA synthesis.
- Precision FPGA synthesis. The interface between Precision and
CatC was easy to use. Precision can be opened directly from CatC.
In the simplest case, we can synthesize our design with one click
of a button.
- Design Compiler. For ASIC logic synthesis on Catapult's RTL, we
used DC. In addition to the RTL output, Catapult C provides a
synthesis script file with appropriate directives for DC that is
human readable and can be modified. Catapult spits out a solution
folder of results (e.g. RTL, verification, and synthesis folders).
If the Design Compiler is installed in the same location as Catapult,
we just double click on the icon in the synthesis folder with the
scripts - and it automatically runs logic synthesis. (Our DC license
is on a different server, so we copy the Catapult output files to a
different location and run the Catapult generated script there.)
VERIFICATION
At Oulu, we may have a different approach to verification than many Catapult
users. Here is our process:
1. We do performance simulations of the algorithms in Matlab. We use C++
code in Matlab by adding a gateway to the C function. The gateway is
an entry point to the MEX-file. MEX stands for "MATLAB executable".
A MEX file consists of a gateway routine and a computational routine.
A C/C++ subroutine can be called from Matlab using a MEX-file after
compiling the code. (source: Matlab)
2. We then add the gateway to the Catapult C code and run it in the Matlab
simulator. This way we can verify that the word lengths used in the
Catapult code are sufficient and that the C code gives correct results.
The word lengths can be changed either by changing the integer size or
the whole word length or both until sufficient performance is reached.
We use the Mentor AC_datatypes in both variable definitions and word
length performance simulations in Matlab. They are easy to use and
we can easily change the word lengths, as Mentor's data type includes
a field for integer width and the total word length. We have used also
the data types just for word length studies prior to using Catapult C.
3. We only rarely simulate Catapult's RTL output, but we could verify
Catapult RTL using a C++ testbench and running ModelSim by clicking the
icon in the solutions folder. For gate level simulations, we create a
VHDL testbench.
4. We have occasionally verified Catapult's synthesis results after running
the generated RTL through Design Compiler in order to obtain power
estimates, and we have observed that the results are correct after logic
synthesis also. I was able to find the area and timing estimates for
one block.
Clock frequency 100 MHz
Area after Catapult 174,960 sq um
Area after Design Compiler 155,570
Slack after Catapult 1.6 nsec
Slack after Design Compiler 0.3 nsec
Total power after Primetime 31 mW
CatapultC has improved in the last 4 years. The tool has become more
efficient, i.e., architectural choices with strict latency requirement are
more possible now and Catapult's run-time has shortened. I also know of
some changes which we do not have licenses for, for example clock gating.
The user interface has remained fairly consistent.
It only took us 2-3 hours to set up Catapult C and it only takes us 5
minutes for us to update to a new version. Even though the tool is easy to
use, it still took us some time to learn how to write the C code in an
optimal way for the tool and to learn how to optimize our design. I would
say it takes months to learn well; it's a process and it varies with each
algorithm. The first designs I implemented with Catapult I had already
previously implemented by hand in VHDL which helped my learning CatC.
Where Catapult needs work is in runtime. On a large design, Catapult can
take a while to run.
I've also suggested to Mentor R&D that they should add a graphical tool
to make scheduling easier to control.
- Johanna Ketonen
University of Oulu Oulu, Finland
Join
Index
Next->Item
|
|