( ESNUG 375 Item 13 ) ------------------------------------------- [06/28/01]

Subject: ( ESNUG 374 #3 ) The Emperor Strikes Back On His SystemC Benchmark

> The anonymous author of the "Embarassment" notes that SystemC is 4.5 times
> slower then SyperLog.  SystemC is good at writing behavioral code; bad at
> writing Verilog code.  If you're switching to SystemC as a replacement for
> writing Verilog, you're missing the point, and will be disappointed.
>
> As to the specific example, I made up my own little test bench, since none
> was provided.  I suspect the author didn't compile with optimization of
> the SystemC library.  Furthermore, bit vectors in SystemC are MUCH slower
> than integers.  Either his example should use sc_bit and sc_bv, or bool
> and int. Since the author used bool, I think it's fair to use int's also.
>
>     - Wilson Snyder


From: [ The Emperor Has No Clothes ]

Hello, John.

Anonymous as usual.

My SuperLog implementation of the up/down counter used bit vectors.  These
are arbitarily long 2-state vectors.  The SystemC sc_bv is also an
arbitarily long 2-state vector, so I chose that in preference to sc_uint as
a balanced comparison, since the SystemC sc_uint type is limited to 64 bits.
Furthermore the example performs bit manipulation that was easier using
SystemC sc_bv types than sc_uint types.

The examples that I originally made did contain a lot of redundant code;
both for the SuperLog and the SystemC versions.  Each counter instance had
its own stimulii instance and I also had an additional cycle counter in the
SuperLog example, plus an extra layer of heirarchy.  The stimulli instances
had a reset phase to preload the upDown counters in cycle 1 with a value of
200.

I have however analysed Connell's SystemC code in ESNUG 374 #3.  I increased
the upDown instances from 4 to 10 and added a printf to print the results
after the 5 Million clock cycles.  I have also updated my SuperLog example
to reflect the simplified testshell heirarchy in that new SystemC example.
The SystemC is now using sc_uint, no longer an abstract length bit vector,
so this is no longer a like-with-like comparison.

The SystemC is also compiled, whereas the SuperLog was interpreted, so the
compile time of the example becomes significant.  Optimisation was -O3.


            SystemC compile              11.0 sec
            SystemC 5M run               71.2 sec
                                        ---------
            SystemC total                82.2 sec

            SuperLog analyse & 5M run    54.2 sec


So even in an unfair comparison, interpreted SuperLog is still faster than
SystemC.  I cannot evaluate how fast compiled SuperLog would be.  Perhaps
the professionals at Co-Design, Inc. could eke out more performance from
this example.

The interesting thing is the results generated by SystemC for the 10
instances after the 5 Millon clocks:

     448 1 0 0
     448 0 0 0
     448 0 0 0
     448 0 0 0
     448 0 0 0
     448 0 0 0
     448 0 0 0
     448 0 0 0
     448 0 0 0
     448 0 0 0

The first instance parity output is correct, the others are wrong.

In SuperLog the bit vector allows bit manipulation

   bit[8:0] count_nxt;
    ...
    parity_out <= ^count_nxt;

In the submitted SystemC using sc_uint this needs to be replaced by a rather
more complicated iterated assignment to an intermediate temporary variable:

    sc_uint<9> count_nxt;
    bool  tmp;
      ...
        for (int ii = 0; ii < 10; ii++) tmp ^= count_nxt[ii];
        parity_out.write(tmp);

For the correct result, 2 fixes are required, tmp initialised and iterator
limited to actual width:

    sc_uint<9> count_nxt;
    bool tmp;
      ...
        tmp = 0;
        for (int ii = 0; ii < 9; ii++) tmp ^= count_nxt[ii];
        parity_out.write(tmp);
          // worse could happen in VHD-Hell


So as well as being embarassingly slower than interpreted SuperLog, SystemC
is also embarassingly more error prone.


The SystemC sc_uint type does make the simulation faster, but then the
system implementor has had to make system design choices based on the
implementation of the simulator tool.  These choices are then propogated
through the design heirarchy into the test bench - so scaleable generic
design is not possible.

I am not a SuperLog expert.  I am certainly no expert on SystemC.  But if
efficient system design requires mucking around with the innards of the
simulator then the future of system design looks bleak.

For simulation speed of RTL, compiled simulators are certainly faster.  The
NC Verilog simulator is 3.1x faster than the SystemC, without the global
optimisation to 2 state values that were applied to the VCS.

My updated SystemC & SuperLog examples, based upon the Jon Connell example,
extended to 10 instances, are enclosed.  A Verilog testbench as well.

    - [ The Emperor Has No Clothes ]


 Editor's Note: These files are in the DeepChip.com "Downloads".  - John


 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)