( ESNUG 385 Item 9 ) --------------------------------------------- [12/19/01]
Subject: ( ESNUG 383 #16 ) ModelSim 5.5c Didn't Scale With Linux Mhz Either
> Has anyone seen that the ratio of performance improvement is not that much
> on NC-Verilog on Linux vs. Solaris? (i.e. faster Mhz didn't speed up
> NC-Verilog much.) I would like to see if others had similar experiences.
>
> - [ To Infinity And Beyond ]
From: Rainer Mueller <rainer@oasis.com>
Hi, John,
I'm wondering if anybody has used ModelSim on both the Solaris/Sparc and
Linux/x86 platforms, and can comment on the performance on each platform?
We use ModelSim version SE VHDL 5.5c.
We've been using Sun Ultra 5 workstations with ModelSim for several months
and recently began evaluating ModelSim on a 2 GHz Pentium 4. We expected
to see at least 3X performance improvement, but we're not seeing anything
close to that. Our Mentor apps engineer has been unwilling or unable to
provide us with benchmarks for their tool on various platforms, so I'm
hoping someone else has tried this and can share their experience.
Here are the systems we've tried:
A) Sun Ultra 5 workstation, with one Ultrasparc IIi CPU at 333 MHz with
2M cache, 512MB RAM. Solaris 8, ModelSim SE 5.5c.
B) Same as A, but with a 400 MHz CPU.
C) Dell Optiplex GX240, with one Pentium 4 CPU at 2 GHz with 1/4M cache.
512MB RAM (PC133). Redhat 7.1, ModelSim SE 5.5e.
The files used for these tests are mounted via NFS from a common fileserver.
Once the simulation loads, network traffic is almost nil. On each platform
the simulation takes less than 200MB of RAM so there's no swapping. The
system load is 1.00 - no other processes apart from the usual OS overhead.
Running a 200 us simulation of an approx. 50K gate design in non-interactive
batch mode gives the following times:
A) 1098 sec
B) 1012 sec
C) 638 sec
Using http://www.spec.org, here's benchmarks for some sparc and x86 systems:
SPECint_base2000 SPECfp_base2000
Dell Precision Workstation 340 648 715
(2.0 GHz P4, 1/4M cache, PC800 RDRAM)
Sun Blade 1000 Model 1900 438 427
(900 MHz Ultrasparc III, 8M cache)
Asus A7V 438 348
(1.3 GHz Athlon, 1/4M cache, PC133 SDRAM)
Intel D815EEA2 421 258
(1.1 GHz P3, 1/4M cache, PC133 SDRAM)
Sun Ultra 10 133 126
(333 MHz Ultrasparc IIi, 2M cache)
Looking at the benchmarks for the Ultra 10 and the Pentium III and 4's, we
were hoping for about 3-4X performance improvement in simulation time.
Instead we're seeing about 1.6-1.8X.
If anybody here has used ModelSim on both x86 and sparc architectures and
could comment on the performance they observed, that'd be greatly
appreciated. Likewise if you've tried ModelSim on an Ultrasparc III, I'd
be interested to know how it compares to an older Sparc. We're a little
disappointed in the performance under Linux/x86 and wondering if we should
be evaluating some high end Sun products to get the performance we expected.
- Rainer Mueller
Oasis SiliconSystems AG Germany
---- ---- ---- ---- ---- ---- ----
> I've also noticed that VCS gives a speedup which scales somewhat with the
> MHz of the machine.
From: Tom Loftus <tloftus@intrinsix.com>
Hi, John,
Since I am in the middle of a puzzling sim performance problem, this topic
caught my interest so I thought I would throw in my two cents.
My experience is that synthesis jobs track CPU performance closer than
digital simulation jobs. My theory is that digital simulations do less
calculations and more random data accesses. This causes poor cache
performance and tends to saturate the CPU to memory datapath before the
CPU can be fully utilized.
I first ran into this several years ago when a former employer made the
mistake of buying Ultra-5 series workstations because they were cheaper
than the Ultra-30's. The CPU specs were comparable, but the memory
bandwidth was lower and the performance was poor. We returned them and
bought more Ultra-30's.
Fast, efficient compiled simulators like MTI and NC-Verilog when run on
workstations (or PC's) which use cheaper low bandwidth memory subsystems
will not perform very well, no matter what speed CPU you put in. Obviously,
swapping should be avoided in all cases as the performance is terrible.
To answer the specific question raised, my experience has been that VCS is
slower than NC-Verilog and therefore would scale with CPU MHz until it also
reached the memory bottleneck whereas NC-Verilog is already there. It
wasn't clear from the post if the VCS and NC-Verilog jobs were the same.
- Tom Loftus
Intrinsix Corp. Rockville, MD
|
|