( ESNUG 374 Item 3 ) -------------------------------------------- [06/14/01]
Subject: ( ESNUG 373 #2 ) Users Kick Holes In SystemC/SuperLog Benchmark
> Around these design units I prepared a testbench in SuperLog and SystemC,
> for 1 and for 10 instances of the counter, running for 1 million clock
> cycles. What I obtained on a Sun at 750MHz (in cpu sec):
>
> counter
> instances: 1x 10x
> ---------- -------- ---------
> SystemC 13.3 sec 114.4 sec
> SuperLog 2.9 12.2
>
> The SystemC results are abysmal. Worse, they scale badly as the number
> of instances linearly increased. ...
>
> I think SystemC is the free bait to hook users into an expensive toolset.
>
> - [ The Emperor Has No Clothes ]
From: Jon Connell <jon.connell@arm.com>
Hi, John,
This benchmark is not a balanced comparison of the two languages, SystemC
and SuperLog. In particular the example includes redundant code and uses
the sc_bv<> type despite clear recommendations in the SystemC documentation
to use sc_uint<> for optimal simulation of fixed-precision data types.
Applying these changes to the original submission provides a fairer
comparison. The counter model and a testbench which instantiates four
instances is included below.
#include "systemc.h"
SC_MODULE(upDown)
{
sc_in<bool> clk;
sc_in<bool> up;
sc_in<bool> down;
sc_in<sc_uint<9> > data_in;
sc_out<bool> parity_out;
sc_out<bool> carry_out;
sc_out<bool> borrow_out;
sc_out<sc_uint<9> > count_out;
sc_uint<9> save_count_out;
sc_uint<10> cnt_up, cnt_dn;
sc_uint<9> count_nxt;
bool load, tmp;
SC_CTOR(upDown) {
SC_METHOD(entry);
sensitive_pos(clk);
}
void entry() {
cnt_dn = save_count_out - 5;
cnt_up = save_count_out + 3;
load = 1;
int mode = (up << 1) | down;
switch (mode) {
case 0: count_nxt = data_in.read(); break;
case 1: count_nxt = cnt_dn; break;
case 2: count_nxt = cnt_up; break;
default: load = 0; break;
}
if (load) {
for (int ii = 0; ii < 10; ii++) tmp ^= count_nxt[ii];
parity_out.write(tmp);
carry_out.write(up & cnt_up[9]);
borrow_out.write(down & cnt_dn[9]);
save_count_out = count_nxt;
count_out.write(save_count_out);
}
}
};
int sc_main (int argc , char *argv[]) {
sc_clock clk("clk", 1, 0.5, 0.0);
sc_signal<sc_uint<9> > data0("data0"), data1("data1"),
data2("data2"), data3("data3");
sc_signal<bool> one("one"), zero("zero"),
carry0("carry0"), borrow0("borrow0"),
carry1("carry1"), borrow1("borrow1"),
carry2("carry2"), borrow2("borrow2"),
carry3("carry3"), borrow3("borrow3");
sc_signal<bool> parity0("parity0"), parity1("parity1"),
parity2("parity2"), parity3("parity3");
one.write(1);
zero.write(0);
upDown u0("u0");
u0.clk(clk);
u0.up(one);
u0.down(zero);
u0.data_in(data0);
u0.parity_out(parity0);
u0.carry_out(carry0);
u0.borrow_out(borrow0);
u0.count_out(data0);
upDown u1("u1");
u1.clk(clk);
u1.up(one);
u1.down(zero);
u1.data_in(data1);
u1.parity_out(parity1);
u1.carry_out(carry1);
u1.borrow_out(borrow1);
u1.count_out(data1);
upDown u2("u2");
u2.clk(clk);
u2.up(one);
u2.down(zero);
u2.data_in(data2);
u2.parity_out(parity2);
u2.carry_out(carry2);
u2.borrow_out(borrow2);
u2.count_out(data2);
upDown u3("u3");
u3.clk(clk);
u3.up(one);
u3.down(zero);
u3.data_in(data3);
u3.parity_out(parity3);
u3.carry_out(carry3);
u3.borrow_out(borrow3);
u3.count_out(data3);
sc_start(clk, 5000000);
}
Here are the results of our experiments with this benchmark including a
simulation compiled without compiler optimization:
ESNUG 373 SystemC code [1] : ############## 142.4 sec
ESNUG 373 SystemC code [2] : ###################### 224.9 sec
ARM SystemC code [1] : ### 36.6 sec
Verilog equivalent [3] : ######################## 238.0 sec
[1] SystemC 1.2; "gcc -g -O3 -march=i686"; 550MHz Pentium III
[2] SystemC 1.2; "gcc -g -O0"; 550MHz Pentium III
[3] VerilogXL 3.2; "verilog +turbo"; 550MHz Pentium III
This suggests that the ESNUG 373 #2 results were generated using an
unoptimized compilation.
- Jon Connell
ARM
---- ---- ---- ---- ---- ---- ----
From: Wilson P. Snyder II <wsnyder@world.std.com>
Hi John,
Here's a back flame for your next issue of ESNUG.
The anonymous author of the "Embarassment" notes that SystemC is 4.5 times
slower then SyperLog. SystemC is good at writing behavioral code; bad at
writing Verilog code. If you're switching to SystemC as a replacement for
writing Verilog, you're missing the point, and will be disappointed.
As to the specific example, I made up my own little test bench, since none
was provided. I suspect the author didn't compile with optimization of the
SystemC library. Furthermore, bit vectors in SystemC are MUCH slower than
integers. Either his example should use sc_bit and sc_bv, or bool and int.
Since the author used bool, I think it's fair to use int's also.
With these changes, I improve SystemC performance by 2.6 times. Thus the
13.3 seconds for SystemC is really more like 5.14 seconds, or about 2.5x
slower then SuperLog. By comparison, SystemC is 6.6x slower then VCS
(--twostate +rad+2). Thus VCS is 2.6x faster then SuperLog. So, if you're
really concerned about performance, use VCS!
The lack of SystemC performance on this example isn't shocking. The
SystemC pin interconnections are written in a horribly inefficient way, so
that this example which has very little code and a lot of interconnect
magnifies the inefficiencies. This is also why more instantiations makes
it worse. If you pick a model which does more work "per pin", you will
reach a different conclusion. We have a little multi-phy router CODED
BEHAVIORALLY we benchmarked, and have measured over 100x improvement of
SystemC over VCS. Again, I don't think of SystemC as a RTL language.
I am sure over time the SystemC kernel will vastly improve. They haven't
optimized the pin interconnect. They aren't optimizing between modules.
They aren't inlining modules. VCS is doing all of those.
Which has more potential gain in the next year? When I manually perform
these optimizations, the simulation is 7.7x faster still, thus making it
complete the benchmark in 0.6 sec, faster then VCS and SuperLog.
> Speed - not satisfied here - but it is a main issue for system
> design for my work.
Then use VCS.
> Readability - much more verbose than Verilog and separating facets of
> the design across 2 files doesn't help.
Use my SystemPerl package (http://www.veripool.com), it uses a single file
and provides /*AUTOs*/ like my Verilog-Mode for Emacs.
> C language issues obscure the system/hardware/RTL design.
It definitely obscures RTL, but again, using SystemC for RTL is crazy.
However for behavioral code, Verilog is a LOT more obscure then SystemC;
try emulating a "map" in Verilog!
> Quality - only C++ compiler syntax traps are available to capture
> errors in the design semantics. Like trying to write a
There are a couple of Linting tools that provide these checks. My
SystemPerl also provides some of the simplest of these checks for free.
> Software CoDesign - SystemC should have an advantage as it is running
> as native C++. However SuperLog can transparently
> invoke C or Verilog or shareable libraries of
> compiled C, C++ and other langauges aswell, without
> using a PLI unlike other HDL simulators.
Nice ad for Superlog. I wonder why the post was anonymous. :-)
- Wilson Snyder
[ Editor's Note: Just as an FYI, Wilson, I also suspected that
benchmark might have been planted by the SuperLog people, but it
came from a user's company e-mail account. If it blatently came
from an EDA vendor, or "yahoo" or "hotmail", I wouldn't have
published it. (And no, Cliff didn't send it in, either.) Also,
I figured since anon gave source code, sharp users like you and
Jon would quickly point out any obvious holes in it. - John ]
|
|