( ESNUG 454 Item 2 ) -------------------------------------------- [04/28/06]

Subject: ( ESNUG 448 #6 ) A boatload of users critique Bluespec Design

> In one of your recent DAC reports there was a mention of Bluespec System
> Verilog, a new behavioral synthesis toolset that promises to "Reinvent
> Hardware Design".  The company, based in Waltham MA, has some fairly
> impressive ideas behind its technology, but little in the way validation
> in terms of real world industry stories.  It would be interesting to see
> whether there are, among your subscribers, any early adopters or perhaps
> even somebody who has taped out a chip using this tool set.  What was
> their experience using Bluespec?
>
>     - Russ Mestechkin
>       Analog Devices, Inc.                       Norwood, MA


From: [ Elvis Lives ]

Hi John,

Please keep me and our company anonymous.

Our company has been evaluating Bluespec for some time now.  First there was
a 5 day training course before the evaluation period was started.  Currently
we use Bluespec version 3.8.65

Basically we've been using Bluespec compiler (BSC) to generate RTL Verilog,
and then we've simulated the generated design with Modelsim and a VHDL
testbench.

We haven't been able to compile a Bsim (Bluespec's simulator) executable so
far, due to OS reasons, so the support for different Linux/Unix versions
could be better.

First impressions of the Bluespec System Verilog (BSV) language:

For each BSV design unit there is a module declaration and an interface
declaration.  Usually BSV uses default clocking and reset which are not
shown in module declaration.  Interface declaration is used to declare
methods (and more interfaces) that are used to access the BSV code in the
module.  Multi-clock (or reset) designs can be made by adding more clocks
(resets )to module definition, not to interface definition.

The generated RTL shall have the clocks defined in module declaration and
other IO that is defined in interface declaration.

Your functionality is described with rules and methods, and for a beginner
it is not always clear how rules and methods block each other, especially
if the blocking rule is somewhere down in the hierarchy.  However after
some practice it is possible to write BSV code that will make intented
logic and structure in RTL.  A change in thinking is required anyway, since
BSV is quite different from VHDL.

About BSC compiler:

Sometimes the compiler's error messages are cryptic.  However with newer
version the messages have come a bit more understandable.

There are also some switches that allow the compiler to print rule and
method scheduling, blocking rule, etc., information of a design.  Some
swithces are used to control the RTL compilation, too.

For multi clock/reset designs the compiler adds clk_ and rstn_ prefixes
to each clock and reset signal, which is a bit annoying.

The compiler runs quite fast.  The generated RTL can be compiled and
simulated  (in Modelsim) and synthesized (Quartus, Precision) without
errors.  In fact, if there aren't errors in the BSC compilation, your
later RTL simulation & synthesis shall work.

What would be needed is a better way to simulate BSV code on source code
level.

The documentation needs to be improved, some things are described very
briefly.

Bluespec's support for the evaluation has been very good.

    - [ Elvis Lives ]

         ----    ----    ----    ----    ----    ----   ----

From: [ Chicken Man ]

Hi John,

Keep me anonymous.

No Bluespec designs yet, but we are planning an eval in the coming months.
I have yet to see any synthesis results, so I can not comment on the impact
of the layer of control logic inserted by the tool.

Anyone who has written a few blocks in Bluespec can attest to the benefits
of the increased structural and behavioral abstraction.  But speaking from
a CAD group's perspective, Bluespec is attractive for it's Hindley-Milner
type system and how it protects clocks and interfaces.  Approximately 90%
of the problems in our front-end flow stem from a designer's misuse of
Verilog, particularly when it comes to clocking.  Unlike linting, the
Bluespec type system prevents nearly all of these issues.

    - [ Chicken Man ]

         ----    ----    ----    ----    ----    ----   ----

From: Renaud Ayrignac <renaud.ayrignac=user domain=st spot gone>

Hi, John,

I'm currently investigating the Bluespec compiler efficiency, designing a
DMA controller.  The goal of this investigation is to be able to compare
the BSV result to the original RTL design in:

  - synthesized size
  - maximum frequency
  - development time (design + verification)
  - debug facilities
  - current flow integration

The first contact with the tool was a little bit hard.

The new concepts intoduced by the tool (atomic rules and methods scheduled
by the tool) are not so easy to handle for someone which is coding in RTL
for years.  In addition, the tool provides a lot of embedded features which
are helpful when correctly managed, but which make the things even more
difficult.

I started with the most simple parts of my DMA design, and after 2 or 3 weeks
of practice, I'm sure I was more efficient with standard RTL language.  When
I had to design the complex arbitrations and the data streaming, coupled
with some weeks of practice, things became easier.  I designed the more
complex parts in a couple of weeks, with a compact code (compared to RTL),
which seems easy to configure and to maintain.  In addition, because the tool
manages the scheduling of atomics parts, it's easy to build mechanisms which
can interact between themselves with the maximum efficiency.

It's not so simple to explain, but as soon as you can split the design in
several independant functions, the global behavior will be optimised, taking
into account that some actions are sometimes allowed, sometimes not.  And
the scheduler will generate a design where all the conflicts between these
functions are resolved.

My opinion is that this Bluespec tool is interesting in complex control
structures (processor cores, cache controllers).  The interest is limited
in case of datapath design.

In addition many embedded features are very useful.  For example I've used

    - a lot of BSV FIFOs with their associated methods
    - the multi clocks domains management features

In both cases, this allow to speed-up the design time, with a limited risk
on the design behavior.

Anyway, I've not explored 30% of the BSV features.

On the verification side, the Bluesim simulator allows a speed-up of the
verification time using block-level tests.  Usually I found 50% of my bugs
with this tool.  However this tool doesn't support multiple clocks.

The overall debug, performed at generated RTL level, has been a little bit
more complex than with a hand-written RTL (where of course 100% of resources
are easy to identify), but all the methods interfaces are visible, all the
registers are also visible and clearly identified, only are missing some
internal variables, and this is progressing with the last deliveries of the
tool.  The average debug time, from one problem occuring to its correction
(including identification of the problem on the simulation waves) was
between 30 to 60 minutes.

Before to conclude I would like to add some notes:

  - the Bluespec user must take care of the tool-generated scheduling
    (visible through reports), in order to avoid unexpected behaviors.
    This because when coding, we expect some actions from the scheduler,
    while it can choose another unexpected scheduling.
  - the level of abstraction can be the same than the RTL one (even if
    useless of course)

The area and the speed I obtained from using Bluespec were roughly equivalent
to the original RTL.  We are a little bit bigger area (around 20% gain), but
the scheduling of tasks is better, and we have a better efficiency of the DMA
controller.  On design parts where the function is exactly the same, the area
difference was in a 5% range.

We were, of course, faster than with RTL (let's say 2 times) for the overall
design + verification time.

We had no specific problems using our usual flow with the Bluespec generated
Verilog RTL.

In conclusion I think that this Bluespec tool can be useful and efficient for
complex control design, but not datapath design.  Having to design such a new
IP, I would prefer to start design with Bluespec rather than with RTL.

    - Renaud Ayrignac
      STMicroelectronics                         Grenoble, France

         ----    ----    ----    ----    ----    ----   ----

From: Kattamuri Ekanadham <eknath=user domain=us.ibm spot gone>

Hi John,
 
Recently there was a query about any hands-on experiences with Bluespec.
We have some experience with Bluespec and thought I would share it with 
you.

We have been investigating Bluespec from the beginning of this year.  As an
experiment, we took the designs of a branch predictor and the load store unit
in one of our high-end processors - that is, we took the plan of the break 
down of functionality across cycle boundaries and used Bluespec to specify
all the logic.  The Bluespec code was compiled successfully and passed
through our IBM synthesis tools, which asserts that the code is really
synthesizable with no readjustment.  We have not evaluated the timing aspects
yet (we are in the process of doing this).

The following are our observations from this experience:

The language provides a very powerful means of expressing complex logic in
a very concise manner.  It relieves the designer from worrying about
generating and checking handshaking signals between modules.  It does a
thorough analysis of dependences and numerous errors are pointed out at a
very early stage.  It figures out all the possible conflicts between
concurrent actions and provides automatic sequencing of the actions. 
One can also over-ride those decisions.

The conciseness comes from the rich abstraction facilities provided by the 
Bluespec language and hence we were able to specify at a level closer to my
vision of the design.  For instance, the branch predictor maintains a
structure to keep information about the decisions it took.  This structure
is concurrently modified by the fetch logic as well as the branch completion
logic.  Specifying this logic and the constraints on their synchronization
was very easy.  The Bluespec code for branch prediction was a few hundreds
of lines and it generated a few thousands of lines in Verilog.  Equivalent
hand coded designs tend to be several times larger than this.

Their compiler comes with a cycle-accurate simulator (in C), which we used 
extensively to examine the correctness of the logic on some input samples.
Debugging at this high-level is easy and improves the productivity quite a
bit.  A side benefit resulting from conciseness is that the Bluespec code
serves as good documentation.  Anyone working with large designs made in
low-level language can imagine the perils of keeping consistent and correct 
documentations.

Another aspect that we really liked was the power of abstract typing and 
parameterization.  For instance, we were able to write the code for the
various stages of the load-store pipe and when we wanted to experiment with
2 pipes and/or two threads it was an  easy change.  The type abstractions
also enabled me to tryout different versions of a table layout, without
changing significant portions of the logic.  If the timing aspects of the
logic generated are also comparable to our existing designs, we will be
confident to say that Bluespec is an excellent vehicle to improve 
productivity in the design process.

    - Kattamuri Ekanadham
      IBM T.J. Watson Research Center            Yorktown Heights, NY

         ----    ----    ----    ----    ----    ----   ----

From: [ Painful Eliminations ]

Hi, John,

Please eliminate my name and company if you publish this.  Thanks.

Coding with Bluespec "rules and methods" is quite different to writing RTL.
Ultimately, this is much more powerful, but it requires to go through the
learning curve.  The interesting point is that it is much faster to evaluate
the impact of a change in micro-architecture here rather than using RTL
coding.  Also, when integrating components, Bluespec takes care of the
arbitration management, where usually most of the design errors stands.

Quality of Bluespec generated RTL is good with respect to the way it
synthesizes (area, timing), but readability can still be improved.  This
relates to the debugging of the generated code.  Even if most of the
problems are to be catched at BSV level, some may pop-up during RTL
simulations and tracing them back to the original BSV code is not sometimes
obvious.

We are not yet at the point to determine how much we gain with this design
approach, but the impression so far is positive.

    - [ Painful Eliminations ]

         ----    ----    ----    ----    ----    ----   ----

From: Michael Pellauer <pellauer=user domain=csail.mit.edu>

Hi, John,

I've heard that you're soliciting stories from users of Bluespec System
Verilog (BSV).  I'm currently a PhD student at MIT working with Professor
Arvind.  Arvind, along with his former student (and now Carnegie Mellon
professor) James Hoe did the initial research that became Bluespec System
Verilog.  I thought some people out there might appreciate a university
user's perspective.

My research group hs a close relationship with Bluespec, Inc. and uses 
their tools daily as our primary hardware description language. 
Currently we are using BSV to model PowerPC processors.  University users
generally have limited resources, and thus our group emphasizes IP 
reuse.  BSV's high-level synthesis features allow us to make processor 
components generalized and swapable.  Thus our framework encourages a 
"tinker-toy" approach to design whereby the architect combines 
predefined library elements to model the target processor.  For example, 
a model of a PowerPC 604 embedded processor might combine 32-bit Fetch, 
Decoder, and Register File modules with an Issue Queue and a 
custom-written MAC path.  A PowerPC 970 model on the other hand would use 
64-bit Fetch and Decode combined with an out-of-order ROB and duplicated 
functional units for superscalar issue.

The price the designer can pay for this improved generality can come in 
the form of extended compile times.  When the compiler must instantiate
a full microprocessor from extremely generalized polymorphic library 
elements into specific hardware structures, the static elaboration phase 
of the compiler can easily take 20 minutes.  However, to designers used 
to working with DC, another 20 minutes seems like a small price to pay. 
(Full disclosure: Last summer I did an internship at Bluespec inc where 
I got to play around with the internals of the compiler. I was able to 
improve the performance of the array-handling code of the static 
elaborator by multiple orders of magnitude.)

Beyond research, we used BSV to teach a course at MIT in Complex Digital 
Systems. As part of the course we taught students first Verilog and then 
Bluespec SystemVerilog. For their final project students worked in 6 
teams of 2-3 people each for approximately six weeks.  At the end of the 
course teams were able to implement the following projects:

  - Out-of-Order SMIPS Processor Using Tomasulo's Algorithm
  - Memory Access Scheduler
  - Cache-Coherent Memory System Using a Ring Network
  - High-Performance SMIPS Processor
  - Hardware Implementation of an 802.11a Transmitter
  - Out-of-Order SMIPS Processor using a Reorder Buffer

ESNUG readers interested in the details can check out the lecture slides
and final project presentations at: http://csg.csail.mit.edu/6.884/

    - Michael Pellauer
      M.I.T.                                     Cambridge, MA

         ----    ----    ----    ----    ----    ----   ----

From: Nirav Dave <ndave=user domain=csail.mit.edu>

Hi, John,

I saw an ESNUG post asking about people's personal experience with BSV and I 
thought my experience would be enlightening. 

I've been using Bluespec System Verilog(BSV) consistently for a few years 
now for academic work, my personal research, and for work this past 
summer at a large corporation.  I have designed a variety of medium to 
large designs, from cache-coherence-protocol engines to out-of-order 
processor designs.  Previous to using BSV, I had done all of my previous 
larger designs in Verilog, designing among other things, large chunks of 
microprocessors and taking them through place and route.  I can say that 
for the task of design I would choose BSV over Verilog any day. 

The greatest thing BSV brings is a very precise (and compiler-enforced)
notation of parallelism in the system via its  scheduling organization.
Bluespec does this by using guarded atomic actions.  By atomic I mean that
every sub-action must happen together and if two different actions happen
in the same cycle it will appear as if one was executed first in entirety
and then the other one happened.  The guard provides a compact way to ensure
that the action-level conditions are propagated to all of the sub-actions
(e.g. if one action is to increment a sent-packet counter and enqueue a
packet into a queue, you only increment the counter (and enqueue the packet)
when it's safe to do the enqueue (the queue is not full)). 

These guarded atomic actions auto-magically force the user to obey the 
assumptions made by the designers of all the sub-modules he's using by 
preventing different conflicting atomic actions from happening in the 
same cycle.  When I first started working with BSV these conflicts used 
to drive me crazy.  I knew exactly what cycle-level timings I wanted in 
my design and this tool was telling me that it wasn't possible.  Then I 
realized what it was doing was notifying me where it wasn't obvious what 
should happen and making me think about what exactly I wanted to have 
happen and to give more information about what to do.  More often than 
not, the tool was pointing out a subtle problem which I had not 
considered.

A simple instance of this I've run into was the classic ships passing in 
the night problem in an issue queue I was making where the incoming ops 
don't receive the notification of the finished ops completing in the 
same cycle.  In Verilog designs I was only to determine what was the 
source of the error after running the testbench on the design and doing 
about half an hour of analysis on the VCD output.  In BSV the compiler 
derived a conflict and showed exactly what the problem was (the 
executing rules had to appear to happen first to make room in the 
pipeline, but in the circuit described the "insert new ops" only had 
access to the previously latched values which does not include the 
writes made that cycle) and I immediately knew what and where the 
problem was and how to fix it (adding that bypass). 

Another quality of BSV is the amount of flexibility and reuse it allows
for different designs.  In the course of my work in BSV I've made 
a reasonably sized library of components which are parametrized not only 
on size, and bit-width, but also bit-representations, priority 
functions, and a slew of other things.   An example of this was when 
I was choosing what the bit representation of a decoded instruction was.
Instead of picking a representation right out I made an abstract 
structure with a field for every possible bit of information I needed 
(e.g. instruction type destination field, number of cycles to execute).
This corresponded to having the decoded instruction latches actually 
contain these values explicitly.  After running through synthesis it 
became obvious that having a number of these fields stored explicitly 
was not worth the additional space penalty (e.g. having the "isBranch" 
explicitly determined in the decoded instruction instead of as a set of 
combinational logic on the instruction type).  Using BSV's user-specified
bits representations it was possible to shorten the size of the decoded 
type, as I wanted without ever having to touch the decoder or even any 
of the myriad of places which read a decoded instruction.  Doing the same
in RTL would have meant going through the design and explicitly changing 
every call by hand. 

The downside of BSV is that it can be quite a culture shock from other 
RTL design methodologies. While it takes very little time to get to the 
point where one can write a relatively simple design, getting fully 
comfortable with the methodology of resolving conflicts may take a few 
weeks. Other advanced concepts can also take some time to learn, but 
once you've figured them out it's hard to imagine not using them. 

To anyone thinking of trying BSV, I encourage you to give it a full 
shot.  In my experience, those that do strongly prefer BSV's approach 
over RTL. And even if you don't choose to use it, it'll change (in a 
good way) how you reason about hardware design. 

    - Nirav Dave
      M.I.T.                                     Cambridge, MA

         ----    ----    ----    ----    ----    ----   ----

From: [ Mr. Bigglesworth ]

Hi, John,

I am working for a startup in stealth mode. So please me anon.

The three aspects of Bluespec that I liked most were:

  1) Data structures and strong type checking.  Create better data structures
     (specify properties).  All fields are type checked when used.  Tagged
     unions help a lot in getting your first cut representation right.

  2) Concurrency control with atomic blocks.  The code is written in atomic
     blocks called "rules".  This is a better representation of concurrency
     in my opinion. You detect errors in anticipated behavior compile time
     rather than using simulation (run time).

  3) Rich libraries.  Bluespec lets you create functions on all objects.
     That gives a better library methodology where you can share many aspects
     of your design (such as rules and interfaces).  Passing parameters is
     also more intuitive.

We reduced the area and the code for two blocks by a factor of two (from a
previous Verilog-based design), all in a matter of two weeks.  The area
savings were primarily from the ability to quickly try different micro
architectures.  We were able to preserve the complete functionality of the
blocks (all tests passing) through all these experiments.

    - [ Mr. Bigglesworth ]

         ----    ----    ----    ----    ----    ----   ----

From: Gert-Jan Tromp <g.j.tromp=user domain=dizain-sync spot gone>

Hi John,

We did an evaluation of Bluespec behavioral synthesis some months ago, after
some of my collegues spoke to them at DAC.  Bluespec has a different approach
at behavioral synthesis: it uses System Verilog constructs like interfaces
and methods and you use rules to describe the behavior.  The Bluespec
compiler generates executable code for simulation and Verilog for synthesis.
It has a command line interface, no fancy user interfaces (fine for me).

The advantage of the Bluespec approach is that the design is always cycle
accurate and synthesizable.  It does take some time to get used to, however.
It is a good tool for complex IP blocks that have to be re-used in different
configurations, because you can describe almost everything parametrizable and
still get good synthesis results (unlike some other behavioral synthesis
tools we tried in the past).  The design we used for evaluation (a multi-port
memory controller) easily met the required clock frequency.  No need for
tweaking or changing the code.

    - Gert-Jan Tromp
      Dizain-Sync                                Markelo, The Netherlands

Index   
Next->Item







   
 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)