( ESNUG 483 Item 4 ) -------------------------------------------- [11/30/09]

From: Tom Wilson <twilson=user domain=wavesat not balm>
Subject: A User's 5 week block design using Synfora PICO C synthesis

Hi, John,

We bought Synfora PICO C synthesis and used it to synthesize Verilog RTL
from ANSI C for a block of a production chip last May.  We had originally
planned to implement the C algorithm in SW, but saw an opportunity to
convert it to a standard hardware accelerator block to:

    1. reduce the resource loading on the on-chip DSPs,
    2. allow for lower chip power, and
    3. to test this High Level Synthesis (HLS) people are talking about.

We hired a design service firm in France (EASII-IC in France) to do our
Synfora PICO implementation because we didn't have any extra engineers in
house to do it.  This was also a low risk approach for us because we could
always go back and implement the algorithm in software if needed.

The block was a standard OFDMA PHY algorithm.  It was pretty straightforward
for our internal algorithm specialists to create the C code for it.  It only
took them about 5 days to write ~400 lines of C code.  This particular block
was very well constrained, which reduced the complexity of the design.

EASII-IC spent a week modifying our initial C code to optimize it for PICO,
which resulted in about 1,600 lines of C code.  PICO C is basically standard
ANSI C-code.  Since it lets you design and explore different architectures
easily, there are several ways to describe the same functionality both in C
and RTL.  How you write your C-code influences the block-level architecture
that will be generated by PICO in the final RTL... this relationship between
C and generated RTL is easy to understand after a few PICO runs.


PICO's TCAB:

PICO automatically optimizes datapath and resources across C loop nests.  In
our case, PICO flattened our C-code and did many optimizations.  The user
can also isolate some parts of your code as a function call and generate a
separate entity (called TCAB) for it.  TCABs can then be used as an operator
in PICO's automatic optimization process, allowing users to control resource
sharing granularity on specific blocks, and specify things like your block's
latency or a separated stall domain.

EASII-IC re-wrote our C-code for our project.  It was originally written as
a specification, and the 2 main difficulties were:

    - memory bandwidth

    - manipulation and insertion of symbols of variable bit width
      in memory words.

EASII-IC dealt with the memory bandwidth and its latency, and inserted LTE
symbols smoothly in the pipeline by using a TCAB.  This functionality is
very powerful for handling exceptions in the datapath.  It allows more
cycles, while maintaining maximum datapath optimization without a
performance penalty.

The code structure was very simple:

  - 1 Processing Array (PA) for initialization of variables.

  - 1 PA to load all input data and insert symbols on-the-fly.
    (This was the most critical and complex task.)

  - 1 PA to flush the internal memory and send the result
    back to the DSP.

The C-code is simple to write.  The one thing you need to keep in mind is to
not create nodes of unnecessary complexity when there's a way to describe
things simply and efficiently.  PICO handles the rest of the implementation.


Total C-to-RTL Project Time: 5 weeks

Our final C code input to Synfora PICO was 1600 lines, and PICO's Verilog
RTL output was 131,000 lines (across about 120 separate Verilog text files).
Some other key metrics: the block's final die area was 146,520 sq microns;
and the block size was 95 K gates + 48 K bits of RAM.

The total project time from when we first started writing the ANSI C code
until we got back RTL from EASII-IC that was completely verified was 5
weeks.  Of this:

  Week 1: Our engineers created original ANSI C code and turned it
          over to EASII-IC.

  Week 2: EASII-IC optimized C code to comply with the PICO synthesis.

  Weeks 3 & 4: Created a bus "wrapper" to interface the hardware
          accelerator to the internal bus.

  Week 5: Tested and verified the entire integrated accelerator plus
          bus "wrapper" block.

This bus wrapper block will be reusable in our future designs, and so a
similar design will take only 3 weeks:

  Week 1: Coding C

  Week 2: Optimizing C for PICO

  Week 3: Verification.

EASII-IC used PICO to verify the Verilog RTL output against the same
testbench that was written for the C code.  EASII-IC used PICO to validate
the Verilog RTL both for performance and function.  We inserted the RTL
module into the rest of our ASIC, and connected it through ports in the
usual way -- it plugged into our design with no issues.

Doing this same ANSI C-to-RTL conversion manually might have taken us the
same amount of time or a little longer.  However, a manual approach has
the human factor: the algorithm designer must hand off the design to the
ASIC RTL designer, who must re-code it in Verilog.

For a functionally constrained block like the one discussed here, that hand
off might have incurred minimal risk of human error -- but we're interested
in PICO for use with many more complex blocks in the future.  So our time
savings in this case were not as important as the overall fact that PICO
was a deterministic, automated flow which removed that human error.  I'll
take that every time!


Same PICO C code for FPGA/ASIC implementations:

Another aspect of PICO that we weren't able to test on this go around is
using PICO to convert the same C code to either an ASIC or an FPGA with the
same behavior.  This is attractive to us because we have an FPGA-based
emulation environment (we built a board with FPGAs internally) and we have
other hardware accelerator blocks captured there.  PICO C Synthesis tool
outputs Verilog code optimized for FPGA that is logically consistent with
the code optimized for the ASIC library.  Without PICO, we could use our
Verilog code in an FPGA -- but it would not be optimized for the FPGA;
instead we would need to hand-code and optimize it to allow an FPGA to run
at-speed.


PICO Gotchas:

PICO RTL output is a sea of small independent Verilog files rather than a
single humanly readable Verilog file.  Synfora should organize and link the
code to output a more consolidated Verilog file.

    - Tom Wilson
      Wavesat, Inc.                              Montreal, Quebec
Join    Index    Next->Item












   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)