( ESNUG 487 Item 9 ) -------------------------------------------- [12/12/10]
From: Ravin Sachdeva <ravin.sachdeva=user domain=st got calm>
Subject: A user review of Synopsys Synphony C compiler (aka Synfora PICO)
Hi, John,
Two of our software algorithm engineers used PICO Extreme to help us design
a complex digital video hardware IP. Our designers did modeling in C, so
they were strong in writing C algorithms; however, they had very limited
exposure to hardware design and no prior experience with RTL. Even so, it
only took them 9 months to complete the design, from April 2009 to December
2009. We do have very deep understanding of digital video technology as we
are all part of a group dedicated to multimedia embedded software/hardware,
which has existed for over a decade within STmicroelectronics.
Our hardware IP was for one of the Entropy Decoding schemes in H.264, named
CABAC (Context Adaptive Binary Arithmetic Coding). CABAC decoding consists
of three main stages:
1) Context Modeling
2) Binary Arithmetic Decoding
3) Debinarization
There is high internal data dependency between the three stages, making
CABAC difficult to pipeline or parallelize, hence preventing the design
from reaching a high throughput. The architecture contains 15 main blocks.
Target: use 45 nm CMOS to achieve real-time video for HDTV systems.
Result: 1 bin per cycle processing from the Binary Arithmetic Decoder
block. (Any CABAC encoded video bitstream is comprised of a
sequence of bins or binary symbols. To obtain actual video
data, each of these bins has to undergo the CABAC decoding.
One bin/cycle essentially means every bin has to complete in
1 clock cycle to guarantee real time performance.)
Target: max gate count under 200 K gates
Result: 184 K gates
Target: 200 Mhz
Result: 222 Mhz
Our group's "customers" are other divisions of STMicroelectronics who do
most of the IP verification, integrate the blocks into the rest of their
design. We were able to meet these customer division's QoR expectations
for the specification, based on their past RTL experience. We verified our
design results using Synopsys Design Compiler, and the RTL is currently
being verified in Cadence NC-Verilog by our customer divisions. Our CABAC
module is being integrated into their larger bigger IP, the onus of
integration and verification was assumed by our customers on themselves.
We don't have any results from their side yet.
We chose PICO for the following reasons:
- C/C++ language coverage. We had extensive experience with C, hence
it was the obvious choice for our HLS input language. Synfora's C
support is extensive and covers almost all of C syntax and semantics
we used.
- A C-based flow that allowed us to specify and explore different
architectures for hardware implementation. We can directly modify
our algorithm to accommodate specification changes -- the cost is
lower than modifying the RTL. Plus we will be able to do future
revisions and spec changes efficiently.
- We wanted to align with our customers' design flow practices and ease
of integration into overall architecture. Our customers have been
using C-to-RTL with PICO as their standard design flow practices for
quite some time now. We used to deliver optimized algorithms to them
and they used to convert it to PICO C; it was a natural progression
for us to start writing and developing with PICO ourselves, saving
resources for our customers.
PICO's runtime for our 200 K gate runs was around 6-8 hours. Because we are
new to HW design, we have no data comparisons of PICO with respect to hand
-written RTL. However, we perceive PICO's main capabilities:
- An integrated design and verification flow with early recognition of
errors/bugs in design. PICO provides various levels of verification
steps, which include Lint Simulation, SystemC simulations and RTL
simulations. We had limited experience with RTL simulation, but we
used the others extensively to catch any bugs that might lead to a
bad RTL generation.
- PICO Extreme has unlimited hierarchy via 'TCABs' which lets us create
blocks and use them throughout our design. PICO compiler then does
resource sharing during optimization, for example with multipliers.
We used 2-3 levels of TCAB hierarchy for our chip's 15 blocks,
comprised in 17 PICO TCABs. We used TCABs mostly in its "in many
places but instantiate once and share" form.
We were able to meet our performance goal ONLY because we were able
to optimize the Arithmetic Decoder (our design core) using the TCAB
feature and share this hardware across many blocks.
- Untimed code input. All the timing and clock generation was left to
the PICO compiler to generate. We were able to hit our performance
goals with a reasonable QoR, albeit we slipped on the schedule part.
(Our design was delivered 3 months past our schedule deadline.)
- PICO takes sequential C programs and exploits the inherent parallelism
in the design. It does extensive parallelization/pipelining analysis
and optimization, such as instruction level parallelism and iteration
level parallelism, within each block. Additionally, PICO identifies,
exploits and combines loop level parallelism, task level parallelism
within each blocks.
What PICO Extreme could improve:
- An efficient and easily understandable error reporting for a failed
synthesis flows. Sometimes the developer is not able to easily
decipher (even from the reports) why the synthesis failed.
- The tool is still evolving somewhat; hence we still find software bugs,
such as segmentation faults. But Synfora gave us timely releases on
latest bug fixes. Some bug fixes took a couple of days, some longer.
We did like that PICO explores multiple architectures for the same design
and its reports and analysis to find the best tradeoffs. We can create
different implementations for the same design, keep different design specs,
and find the optimal performance-power-area solution.
Their technical support on any issue by just a phone call away! Beginning
with the training on the tool to the IP delivery of the IP, Synfora was
committed to our design and helped us climb the steep learning curve.
PICO has simple, easy to use interface and detailed documentation. We used
the visualization features in the GUI extensively, especially the Power,
Performance and Area graph, Resource Browser, Recurrence Viewer.
LEARNING CURVE:
As I mentioned, our team was highly experienced in writing optimized C for
embedded platforms teams, but had very limited exposure to writing C for
hardware design - this was a steep learning curve for us. Synfora educated
us on the various architectural implications of writing in C, such as:
- how arrays in C get converted into memories in hardware
- how arbitration of memory ports is done
- how to design Finite State Machines for efficient hardware
We ran our design through PICO multiple times with different performance
parameters, and used Synfora's reports to check out multiple architectures.
We used that data to analyze the portions of hardware that took up greater
area and timing than expected, and modify them.
Since this was our algorithm designers' first time debugging HW vs. SW,
there was a learning curve involved. For example, we initially started
with PICO's GUI. As we increased our number of implementations, we had to
learn using TCL scripts for efficient design management. Additionally,
our software compilation-and-run usually ran for a few minutes, as
compared to the 6-8 hours mentioned above that it took to synthesize the
full chip in PICO.
Our first project was to design an on-the-fly co-processor/accelerator to
decode real time macroblock data coded using CABAC for the HDTV video.
We have already begun a new project using PICO currently used to design a
standalone processor for Rempeg -- a memory compression algorithm for video
systems. Now that we have some experience in designing hardware with C, we
expect this next project to go faster. As a rough figure total development
time for our next processor is supposed to be 6 months, but it is also much
smaller and less complex IP than CABAC.
I would like to recognize the other team members that were involved in this
project: Sumit Johar and Daniele Alfonso, who was the Project Manager.
- Ravin Sachdeva
STMicroelectronics Pvt. Ltd Noida, India
Join
Index
Next->Item
|
|