( ESNUG 479 Item 4 ) -------------------------------------------- [02/05/09]
From: Shawn McCloud <shawn_mccloud=user domain=mentor not calm>
Subject: "What are the 2 biggest reasons to use High Level Synthesis?"
Hi John,
We recently ran a blind worldwide survey on what companies see as their
primary reasons for using ANSI-C/C++/SystemC-to-RTL synthesis. A total of
820 engineers responded, and below is what they told us. Since they could
pick 2 answers, the total is ~200% rather than 100%.
"What are the 2 biggest reasons to use High Level Synthesis?"
Faster time to RTL : ######################## 64%
Faster verification time : ################## 49%
Fewer engineering resources : ############ 31%
Fewer Bugs : ####### 19%
RTL better than hand-coded : ##### 14%
Faster ECO implementation : ### 8%
Better product differentiation : ### 7%
Other : ## 4%
The results? Almost 2/3rds (64%) recognized that high level synthesis will
get to RTL code faster. A US-based CatapultC customer gave some hard data
on this in ESNUG 477 #3. He commented that with CatC synthesis, he was able
to get to fully verified RTL in only half the time as it took him to get
there manually.
> We do a lot of architectural exploration and optimization with CatapultC,
> typically about 3 months worth. Once our C/C++ coding is done and we are
> happy with the architectural exploration, it takes us 2 hours to go from
> C/C++ to RTL for a 300K gates with CatC. For us to do a manual Verilog
> RTL implementation of those same 300K gates would take at least 6 months.
What some of your readers might find surprising was that virtually half
(49%) recognized that high level synthesis makes it faster for engineers to
verify their designs when compared to a more error-prone manual RTL
implementation process. The total verification benefits of high level
synthesis are as important to them as the faster time to RTL.
Faster time to RTL : ######################## 64%
- vs. -
Faster verification time : ################## 49%
Fewer Bugs : ####### 19%
Since verification can dominate the design cycle, I wanted to share with
your readers some verification guiding principles which will allow high
level synthesis to substantially reduce verification time.
HIGH LEVEL SYNTHESIS RULES TO MINIMIZING VERIFICATION TIME
1. Your high level synthesis tool is only as good as its ability to
verify your RTL design implementation.
Any high level synthesis (HLS) benchmark must evaluate the flow through
verification, not just to area and performance. An example of this is one
of our first CatapultC evals with a European wireless semiconductor
manufacturer in early 2002. The design was a 256 tap, radix 4 FFT, 70 MHz
design in a 90 nm ASIC technology. The company had previously implemented
the design using hand coded RTL, then used Catapult to implement it starting
from their original pure ANSI C++ algorithm leveraging SystemC data types.
Manual RTL implementation : 20 days
Catapult C RTL implementation : 1 day
But CatC's verification methodology was not mature back then. They spent
the next month trying to verify if the design was correct, so the time saved
for rapid RTL generation was quickly eaten up by verification.
Based on this experience, we developed a completely automated verification
framework called "sc_verify", or SystemC Verification.
CatC now generates not only the RTL model, but also SystemC transactors
along with SystemC wrappers to synchronize the RTL behavior relative to the
original ANSI C++ algorithm. Designers can now verify the RTL relative to
the C++ source description by reusing the original C++ testbench in any
industry standard simulator (i.e. QuestaSim, NC-Sim, VCS).
2. High Level Synthesis input source impacts verification.
We chose ANSI C/C++ as the primary input language for CatC synthesis. Pure
ANSI C/C++ has no concurrency -- it's a simple sequential description, and
Catapult C automates the process of creating concurrency in hardware
implementation.
In contrast, hardware C languages such as SystemC explicitly define
concurrency in the source. It's not the same level of concurrency as RTL,
but is still hardcoded concurrency.
This ties directly into verification: If you hardcode concurrency in your
source, any time you change that concurrency you must re-verify the source.
For example, if you specify an FFT algorithm which operates on a single
memory (in-place vs. systolic) one common approach to run the FFT at
higher performance is to map two array words side-by-side in memory, thereby
effectively doubling the bandwidth of the algorithm.
- With SystemC, you specify the RAM interface directly, and you must
explicitly split the FFT algorithm to leverage the two concurrent data
ports. If you later change to using a different memory organization,
you must rewrite a significant percentage of the algorithm; this means
you risk new hand coded errors and risk for more bugs.
- If you use algorithmic synthesis from a pure sequential ANSI C/C++
implementation, you can change from one memory organization to another
without modifying your source. CatC has the freedom to reorganize the
memory and partially unroll loops to build the design you want. You
thus have less chance for errors and more time exploring design space
concurrency.
3. Datatype selection can speed or slow C-level simulation.
Catapult fully supports SystemC integer and fixed point datatypes. However,
some aspects of SystemC data types are impossible to model in hardware,
because the C simulation does one thing while the hardware after high level
synthesis does another. This occurs in signed shift operations & division.
In both cases, the SystemC simulation uses a precision beyond the specified
hardware precision and then truncates, while the hardware computes using the
actual number of hardware bits throughout the computation. (This makes it
difficult to verify that the RTL matches the C because now the C simulation
has a greater precision than the RTL.)
In 2005 Mentor announced an open integer and fixed point datatype called
the Algorithmic C Datatype or "ac_datatype" for short. The ac_datatypes
were developed from the ground up taking both synthesis and verification
into consideration, so they are semantically consistent across any precision
and between C simulation and HW implementation. They have the added benefit
of being up to 200x faster than SystemC with much faster compile times as
well.
For example, a simple benchmark design that performs various 16 bit fixed
point arithmetic operations 9 million times takes 0.29 seconds with AC
datatypes and 67 seconds with sc_fixed. The datatypes are freely available
on the Mentor website.
4. Forcing block level concurrency in the C/C++ source can push your
verification to RTL-level rather than C-level.
You can hardcode process level concurrency using SystemC through something
called "sc_cthread". Doing so makes it easy to specify block-level
concurrency in the source. However, with two parallel processes you are
then unable to verify your algorithm will work correctly in hardware until
after ESL synthesis, pushing your verification headache onto RTL or at
least some cycle accurate/timed netlist.
To explain further, two processes operating in parallel must have a notion
of latency associated with the block to create a deterministic simulation.
The very nature of high level synthesis is to offer the user a range of
possible designs with differing latency meaning you must run HLS to have a
known latency. This is a chicken and egg situation because without a known
latency you cant verify your algorithm. Forcing algorithm verification to
be done after HLS and in a slower timed netlist both slows development time
and complicates verification, especially when you multiply the complexity
by changing the process level concurrency.
Catapult C supports a standard ANSI-C/C++ modeling method based on Kahn
Process Networks (KPN). The advantage of this approach, versus explicit
concurrency through sc_cthread is the source still remains sequential and
therefore deterministic. The design is described as a series of sequential
blocks connected with FIFOs, and Catapult then makes the hardware blocks run
concurrently, leveraging data reordering where possible. Users can still
functionally verify the RTL design generated against the original algorithm,
because the C source still has a deterministic behavior, and the majority of
verification can be done on the pure ANSI C++ algorithm which simulates
upwards of 100,000x faster than RTL.
Some specialized blocks, such as arbiters and bus interfaces, need to react
to multiple ports with the same priority. These reactive blocks may indeed
by more easily specified by hard coding concurrency and timing into the
source.
5. Do not artificially insert functional differences during high
level synthesis.
a. Avoid uninitialized variables.
ANSI C/C++ does not define the value of uninitialized data elements, so if
an HLS tool blindly initializes the variable to a known state, it would
result in inconsistencies across C compilers and add potentially undesirable
startup cycles to your design as it performs this initialization.
CatapultC lets the C code simulate as defined. When designers compare the
algorithm simulation against their RTL code, these uninitialized variables
show up as differences versus the RTL. This gives the designer the
opportunity to clean up the original algorithm by manually initializing
variables which have a true impact on hardware.
b. Avoid out-of-bounds accesses.
CatC verification framework integrates industry standard C/C++ analysis
tools such as Valgrind, Purify and Visual C++. This makes it easy to use
these tools to find more algorithm errors; such errors will also show up as
differences against the RTL design. Once again, an HLS tool should never
just blindly fix algorithm errors since its an opportunity to mask a real
algorithm problem.
The approach Catapult C takes amounts to linting your ANSI-C/C++ source.
c. Use bit-accurate sources.
ANSI C/C++ native datatypes only model 8-, 16- and 32-bit data widths. As
mentioned earlier, to model bit accurate behavior (i.e.: 2, 3, 1027 bits,
etc) SystemC or the ac_datatypes can be used to model true bit accurate
hardware, critical since every bit counts (i.e.: area, performance, power).
Having a bit accurate source means you can verify the bit-accurate behavior
at the ANSI C/C++ abstraction. You would never want to change the bit
widths as part of the synthesis process since doing so would push the bit-
accurate verification problem to RTL or some lower level netlist.
Here are some specific verification features we've added to to CatC to take
advantage of the 5 rules above:
1. C-to-RTL Formal Verification.
Mentor has been working closely with Calypto for the past 3 years to
integrate with their Sequential Logic Equivalence Checker (SLEC) with CatC.
The process of High Level Synthesis, transforms the C++ variable to
registers; identifying the map between these variables and registers is
key to formal verification having reasonable runtimes.
Cat's integration with SLEC provides a complete push-button environment
as well as Flop Maps to make a full proof possible. Calypto discussed this
specific methodology in detail in ESNUG 478 #9.
2. Synchronized communication.
CatapultC's multi-block communication is synchronized by default (i.e.
handshake). The advantage of this approach is that verification has clear
signals to monitor during simulation. In contrast an unsynchronized
communication approach would be much harder to verify because the protocol
is not visible in the waveform meaning some other method must be used to
describe and check the protocol (i.e. assertions).
3. System-Level Deadlock detection.
Cat automates deadlock detection through the integrated SCVerify flow.
This verification environment will stop simulation and report an error if
the design deadlocks.
The bottom line is that CatC is not just about getting you to RTL fast,
it's about getting you to functionally verified RTL fast.
- Shawn McCloud
Mentor Graphics Wilsonville, OR
---- ---- ---- ---- ---- ---- ----
From: John Cooley <jcooley=user domain=zeroskew not calm>
To: Shawn McCloud <shawn_mccloud=user domain=mentor not calm>
Hi, Shawn,
In your letter you mentioned an "an open integer and fixed point datatype"
called "ac_datatype". Is this REALLY an open datatype that anyone can use?
Or am I going to see later letters from the Cadence or Synopsys folks
complaining that your "open" datatype has trick clauses in it that make it
really not open? As we both know, U.S. EDA vendors pull games like this on
each other all the time.
Can your worst EDA rival freely make profitible tools that steals business
directly away from Mentor using this ac_datatype with zero, nada, NO legal
nor monetary requirements from Mentor or any other organization?
- John Cooley
DeepChip.com Holliston, MA
---- ---- ---- ---- ---- ---- ----
To: Shawn McCloud <shawn_mccloud=user domain=mentor not calm>
From: John Cooley <jcooley=user domain=zeroskew not calm>
Hi, John,
The ac_datatypes license agreement is based on a GNU license model.
Anyone can download the ac_datatypes, use and even distribute. Users can
also modify the datatypes for their own use but cannot distribute a modified
version. The reason is we do not want multiple versions of the datatypes
floating around.
The license agreement makes no distinction between customers and EDA
companies, so yes, even one of our evil competitors could build tools which
profit from these ac_datatypes.
We invested 2 man-years developing these ac_datatypes and after much
discussion decided the ESL industry as a whole would be better served
making them freely available. Since announcing the ac_datatypes in June '06
we have had several thousand downloads. These datatypes not only help
synthesis but the more general problem of modeling bit accurate behavior
in ANSI-C/C++ algorithms.
- Shawn McCloud
Mentor Graphics Wilsonville, OR
[ Editor's Note: For a paper detailing ac_datatypes for verification
and synthesis, check out #62 in the DeepChip Downloads. - John ]
Join
Index
Next->Item
|
|