( ESNUG 374 Item 3 ) -------------------------------------------- [06/14/01]

Subject: ( ESNUG 373 #2 ) Users Kick Holes In SystemC/SuperLog Benchmark

> Around these design units I prepared a testbench in SuperLog and SystemC,
> for 1 and for 10 instances of the counter, running for 1 million clock
> cycles.  What I obtained on a Sun at 750MHz (in cpu sec):
>
>               counter
>             instances:            1x           10x
>             ----------           --------     ---------
>               SystemC            13.3 sec     114.4 sec
>              SuperLog             2.9          12.2
>
> The SystemC results are abysmal.  Worse, they scale badly as the number
> of instances linearly increased. ...
>
> I think SystemC is the free bait to hook users into an expensive toolset.
>
>     - [ The Emperor Has No Clothes ]


From: Jon Connell <jon.connell@arm.com>

Hi, John,

This benchmark is not a balanced comparison of the two languages, SystemC
and SuperLog.  In particular the example includes redundant code and uses
the sc_bv<> type despite clear recommendations  in the SystemC documentation
to use sc_uint<> for optimal simulation of fixed-precision data types.
Applying these changes to the original  submission provides a fairer
comparison.  The counter model and a testbench which instantiates four
instances is included below.

  #include "systemc.h"

  SC_MODULE(upDown)
  {
    sc_in<bool> clk;
    sc_in<bool> up;
    sc_in<bool> down;
    sc_in<sc_uint<9> > data_in;
    sc_out<bool>  parity_out;
    sc_out<bool>  carry_out;
    sc_out<bool>  borrow_out;
    sc_out<sc_uint<9> >  count_out;

    sc_uint<9> save_count_out;
    sc_uint<10> cnt_up, cnt_dn;
    sc_uint<9> count_nxt;
    bool load, tmp;

    SC_CTOR(upDown) {
        SC_METHOD(entry);
        sensitive_pos(clk);
    }
    void entry() {
      cnt_dn = save_count_out - 5;
      cnt_up = save_count_out + 3;
      load = 1;

      int mode = (up << 1) | down;
      switch (mode) {
        case 0: count_nxt = data_in.read(); break;
        case 1: count_nxt = cnt_dn; break;
        case 2: count_nxt = cnt_up; break;
        default: load = 0; break;
      }

      if (load) {
        for (int ii = 0; ii < 10; ii++) tmp ^= count_nxt[ii];
        parity_out.write(tmp);
        carry_out.write(up & cnt_up[9]);
        borrow_out.write(down & cnt_dn[9]);

        save_count_out = count_nxt;
        count_out.write(save_count_out);
      }
    }
  };

  int sc_main (int argc , char *argv[]) {

    sc_clock clk("clk", 1, 0.5, 0.0);

    sc_signal<sc_uint<9> > data0("data0"), data1("data1"),
      data2("data2"), data3("data3");

    sc_signal<bool> one("one"), zero("zero"),
      carry0("carry0"), borrow0("borrow0"),
      carry1("carry1"), borrow1("borrow1"),
      carry2("carry2"), borrow2("borrow2"),
      carry3("carry3"), borrow3("borrow3");

    sc_signal<bool> parity0("parity0"), parity1("parity1"),
      parity2("parity2"), parity3("parity3");

    one.write(1);
    zero.write(0);

    upDown u0("u0");
    u0.clk(clk);
    u0.up(one);
    u0.down(zero);
    u0.data_in(data0);
    u0.parity_out(parity0);
    u0.carry_out(carry0);
    u0.borrow_out(borrow0);
    u0.count_out(data0);

    upDown u1("u1");
    u1.clk(clk);
    u1.up(one);
    u1.down(zero);
    u1.data_in(data1);
    u1.parity_out(parity1);
    u1.carry_out(carry1);
    u1.borrow_out(borrow1);
    u1.count_out(data1);

    upDown u2("u2");
    u2.clk(clk);
    u2.up(one);
    u2.down(zero);
    u2.data_in(data2);
    u2.parity_out(parity2);
    u2.carry_out(carry2);
    u2.borrow_out(borrow2);
    u2.count_out(data2);

    upDown u3("u3");
    u3.clk(clk);
    u3.up(one);
    u3.down(zero);
    u3.data_in(data3);
    u3.parity_out(parity3);
    u3.carry_out(carry3);
    u3.borrow_out(borrow3);
    u3.count_out(data3);

    sc_start(clk, 5000000);
  }

Here are the results of our experiments with this benchmark including a 
simulation compiled without compiler optimization:

  ESNUG 373 SystemC code [1] : ############## 142.4 sec
  ESNUG 373 SystemC code [2] : ###################### 224.9 sec
  ARM SystemC code [1]       : ### 36.6 sec
  Verilog equivalent [3]     : ######################## 238.0 sec

  [1] SystemC 1.2; "gcc -g -O3 -march=i686"; 550MHz Pentium III
  [2] SystemC 1.2; "gcc -g -O0"; 550MHz Pentium III
  [3] VerilogXL 3.2; "verilog +turbo"; 550MHz Pentium III

This suggests that the ESNUG 373 #2 results were generated using an
unoptimized compilation.

    - Jon Connell
      ARM

         ----    ----    ----    ----    ----    ----   ----

From: Wilson P. Snyder II <wsnyder@world.std.com> 

Hi John,

Here's a back flame for your next issue of ESNUG.

The anonymous author of the "Embarassment" notes that SystemC is 4.5 times
slower then SyperLog.  SystemC is good at writing behavioral code; bad at
writing Verilog code.  If you're switching to SystemC as a replacement for
writing Verilog, you're missing the point, and will be disappointed.

As to the specific example, I made up my own little test bench, since none
was provided.  I suspect the author didn't compile with optimization of the
SystemC library.  Furthermore, bit vectors in SystemC are MUCH slower than
integers.  Either his example should use sc_bit and sc_bv, or bool and int.
Since the author used bool, I think it's fair to use int's also.

With these changes, I improve SystemC performance by 2.6 times.  Thus the
13.3 seconds for SystemC is really more like 5.14 seconds, or about 2.5x
slower then SuperLog.  By comparison, SystemC is 6.6x slower then VCS
(--twostate +rad+2).  Thus VCS is 2.6x faster then SuperLog.  So, if you're
really concerned about performance, use VCS!

The lack of SystemC performance on this example isn't shocking.  The
SystemC pin interconnections are written in a horribly inefficient way, so
that this example which has very little code and a lot of interconnect
magnifies the inefficiencies.  This is also why more instantiations makes
it worse.  If you pick a model which does more work "per pin", you will
reach a different conclusion.  We have a little multi-phy router CODED
BEHAVIORALLY we benchmarked, and have measured over 100x improvement of
SystemC over VCS.  Again, I don't think of SystemC as a RTL language.

I am sure over time the SystemC kernel will vastly improve.  They haven't
optimized the pin interconnect.  They aren't optimizing between modules.
They aren't inlining modules.  VCS is doing all of those.

Which has more potential gain in the next year?  When I manually perform
these optimizations, the simulation is 7.7x faster still, thus making it
complete the benchmark in 0.6 sec, faster then VCS and SuperLog.

>  Speed - not satisfied here - but it is a main issue for system
>          design for my work.

Then use VCS.


>  Readability - much more verbose than Verilog and separating facets of
>                the design across 2 files doesn't help.

Use my SystemPerl package (http://www.veripool.com), it uses a single file
and provides /*AUTOs*/ like my Verilog-Mode for Emacs.


>  C language issues obscure the system/hardware/RTL design.

It definitely obscures RTL, but again, using SystemC for RTL is crazy.
However for behavioral code, Verilog is a LOT more obscure then SystemC;
try emulating a "map" in Verilog!


>  Quality - only C++ compiler syntax traps are available to capture
>            errors in the design semantics.  Like trying to write a

There are a couple of Linting tools that provide these checks.  My
SystemPerl also provides some of the simplest of these checks for free.


>  Software CoDesign - SystemC should have an advantage as it is running
>                      as native C++.  However SuperLog can transparently
>                      invoke C or Verilog or shareable libraries of
>                      compiled C, C++ and other langauges aswell, without
>                      using a PLI unlike other HDL simulators.

Nice ad for Superlog.  I wonder why the post was anonymous.  :-)

    - Wilson Snyder


  [ Editor's Note: Just as an FYI, Wilson, I also suspected that
    benchmark might have been planted by the SuperLog people, but it
    came from a user's company e-mail account.  If it blatently came
    from an EDA vendor, or "yahoo" or "hotmail", I wouldn't have
    published it.  (And no, Cliff didn't send it in, either.)  Also,
    I figured since anon gave source code, sharp users like you and
    Jon would quickly point out any obvious holes in it.  - John ]


 Sign up for the DeepChip newsletter.
Email
 Read what EDA tool users really think.


Feedback About Wiretaps ESNUGs SIGN UP! Downloads Trip Reports Advertise

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley.  All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |

   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)