( ESNUG 482 Item 5 ) -------------------------------------------- [06/30/09]
Subject: ( ESNUG 480 #2 ) One hands-on user's first look at Synfora PICO
> Just to clarify, our Synfora Pico Extreme competes directly with Mentor's
> CatapultC, Forte's Cynthesizer, and Cadence's C-to-Silicon tools. For us,
> "synthesis" is taking ANSI C and converting it to Verilog RTL. (To get to
> a gates, you must run our RTL through Design Compiler, RTL Compiler, or
> BlastCreate.) Our claim to fame is PICO can do some amazing architectural
> optimizations at the C level on complex, multi-block designs.
>
> - Vinod Kathail
> Synfora, Inc. Mountain View, CA
From: [ I am the Walrus ]
Hi John,
I am a Synfora PICO user. After reading ESNUG 480 #2, I would like to share
my experiences with your readers.
In 2008, I used PICO for two modules in our video processing chip. Two
different design groups at our company had already been using PICO, starting
in 2007. I went through PICO training while learning about the algorithm
our R&D team developed. I was able to implement my micro-architecture in
2 weeks with the help of the Synfora AE; he gave me a training class and
also helped me to build a template/structure of my design.
It was relatively easy to learn the C synthesis tool and start the first
design. If a user is good at C/C++ and has some experience of RTL coding;
it will probably take them only a few days to be proficient. If you do not
have a C/C++ background, it might take them a few weeks.
Initially bad results with PICO:
Exploring our micro-architecture was much faster in PICO compared to manual
Verilog RTL design. For our 100K gate 45 nm design it took us 1/2 day to
create each new architecture option with PICO, compared to 3 days to
do it manually.
PICO synthesis initially showed really bad timing and area results, which
Synopsys Design Compiler confirmed. I worked with our R&D team to explore
possible enhancements in their algorithm for hardware implementation. It
turns out that our C code was bad for meeting our design constraints. We
brainstormed many different approaches and tried them in PICO, which took
us 3 days. (It would have taken 3 weeks if I had to do this by normal RTL
coding.) At the end, the result was satisfactory - we reduced our gate
count from 200 K to 80 K, and met our clock requirement of 250 MHz.
Design Verification:
The total time we spent on verification using ANSI C and PICO was 3 months,
compared to 6 months we would expect to spend if we had done the design
exclusively at the Verilog RTL level.
Design verification at the C level has a lot of advantages over verification
at the RTL level:
1. It's easy to setup verification in C with print statements and file
comparisons. In RTL design verification environment, we would have
to setup different test benches at each level.
2. C simulation has a shorter run time - this is the biggest verification
time savings. For a 720x480 frame, our VCS RTL simulation would take
more than 4 hours, while our C simulation in the PICO environment
takes less than 10 minutes; let's us run 24x more tests/regressions
at the C level than in the Verilog level.
3. With C, we can run design verification at different levels (core/block)
for better coverage and regression. For example, we ported some
selected test cases to the module level for interface verification.
Our verification plan was different at the different hierarchy levels
of the design. Our design had two modules, A and B. Each of module
A's functional blocks had an inverse function in module B. So we
verified at three levels:
- the functional block level
- the loop back of each functional block and its inverse function
- the PICO top level design with all functional blocks
PICO also provides a simulation environment for RTL simulation at its core
level. It is self-contained in that we only needed to write a script to run
simulation in the PICO environment for our generated RTL code. We can run
all the C level test cases in the PICO environment for RTL simulation, so it
can be part of our regression.
Of course, PICO has some drawbacks:
1. Interfaces at the module level require a Verilog RTL wrapper. Our
design has special data bus protocols. The PICO interface does not
meet our protocol and so we must add a Verilog RTL wrapper, which
requires additional RTL coding and design verification. We'd like
to see Synfora add a more flexible interface.
2. Lint/Formal Verification issues. Our company has a SpyGlass flow for
linting and Cadence Conformal for EC on all designs. Code generated
from C always raises flags in our flow. It typically takes 2-3 weeks
to go through the flags to ensure we have a quality design. Formal
verification is especially tough due to the dead logics in generated
RTL. We're thinking of trying Calypto on this, but don't have it yet.
3. Dead logic issues. PICO's generated Verilog RTL code coverage is
below our company standard, mostly due to the dead logic it generates
in its RTL code. Formal verification between C code and generated RTL
code is an on-going issue at this time. We are hoping to use formal
tools to complete our design verification effort. If this is
successful, coverage in generated RTL will be less of a concern for us.
ANSI C vs. C++
PICO only supports ANSI C, not C++. Most cases of hardware design that I
see are sequential; I am not aware of any object-orientated HW designs.
Therefore, any designs in C++ might have to be sequential designs, which is
not much different from ANSI C coding. Our R&D folks can estimate the gate
count with PICO when they develop their algorithms. I only used the basic
ANSI-C library in PICO design. I will explore more moving forward.
ANSI C coding styles and gate counts
Coding in ANSI C is easier than RTL design. But the designer has to keep
hardware design in his/her mind, and realize their C coding style could
have great impact on the final gate count. For example, in my design the
control decision-making required 36 sets of calculations using same the
logic equations simultaneously (in one cycle). Each calculation set was
about 2 K gates, so that was 72 K gates total for this decision. Instead of
coding each calculation as a function call, I constructed a look-up table
which was about 2 K gates to replace the 36 function calls; that's 2 K gates
versus 72 K gates. Additionally, not all C code can be compiled to hardware
design, and PICO will only provide structures for hardware design.
Hierarchy
Synfora's TCAB (Tightly Coupled Accelerator Blocks) is a great structure;
it allows our users to design hierarchically and save compilation time. It
is similar to a bottom-up compilation in Synopsys Design Compiler. For a
large design, you can compile small/critical blocks first and then move up
the next level until the top level design. It is also great for reusability
if you use the same block in different designs.
In my design, one FIFO level has a huge impact on throughput speed. I could
double the FIFO size to shorten the throughput delay. There was also a
tradeoff between registers and memory; with a few line changes in C code and
scripts, the PICO compilation shows the gate count/performance tradeoffs
for different approaches.
Synfora PICO lets our R&D people implement their algorithms using ANSI C and
the PICO library for hardware design. Afterwards getting to verified gates
is a fairly quick process for our RTL designers. Reducing our gate count
from 200 K to 80 K (at 250 Mhz) plus a 24x sim speed-up sold us on PICO.
- [ I am the Walrus ]
Join
Index
Next->Item
|
|