( DAC'16 Item 3 ) ----------------------------------------------- [12/16/16]
Subject: IC Manage PeerCache and CDNS Rocketick get #3 as Best of 2016
ALL THINGS PARALLEL: If you had to name the one trend which characterized
the past 12 months in EDA and chip design, I'd say 2016 was "The Year That
True Parallelization Became Real".
---- ---- ---- ---- ---- ---- ----
For general EDA tool use, true parallization was invented (or should I say
re-invented) this year when the ever busy R&D guys at IC Manage came up
with PeerCache, a tool that does peer-to-peer data transfers/updates in
fully parallel workflows of both source and generated project data.
It's as if BitTorrent or Napster came to your chip design (or verification)
workspace. You have dozens of engineers a working on the same project at
the same time -- tweaking this and modifying that -- and this can easily
involve say 2 terabytes of data if you include both your design's source
files and its related generated files (timing reports, waveforms, etc.).
In the olde days, to populate your 1 terabyte of design into your workspace
readily took 2-3 hours. With PeerCache, 1 TB takes less than 60 seconds!
And for capacity it benchemarked populating 10 TB in 10 minutes.
To work this magic, PeerCache snarfs all the right bits and fragments of
your project from your fellow engineers' workspaces. Shiv Sikand worked on
this in detail at IC Manage and announced it at DAC'16. (See ESNUG 561 #2)
WAIT!, IT DOES MORE!: In addition, PeerCache also 4X to 20X accelerates
your EDA tool's monster big data "reads" and "writes" with 2000 MB/sec
transfers. Plus, through clever data redundancy reduction (and by only
storing "deltas") it can take 47 TB of design & its related generated data,
and squeeze that down to 200 GB on your hard drive. (Again ESNUG 561 #2)
Get that? 20X faster EDA tool reads/writes, 150X faster workplace loading,
and using 1/200th the hard drive storage -- all because of parallization.
---- ---- ---- ---- ---- ---- ----
For Verilog simulation, true parallization first appeared on the engineering
public's radar screen when Rocktick -- a project that Uri Tal had worked on
in Israel for 4 years -- had it's first (tiny) booth at DAC'2011.
And even by 2013, RocketSim was only a Nvidia-GPU-only gate-level Verilog
simulator -- but it still benchmarked 23X faster against VCS. (ESNUG 523 #4)
Captures time (hours) Time/capture Speed-up
-------- ----------- ------------ --------
VCS 7 8.2 1.17 hrs 1X
Rocketsim 7 0.64 0.09 hrs 13X
RocketSim 102 5.31 0.05 hrs 23X
Jump forward to 2016; Cadence acquires a Rocketick that now does both gate-
and RTL-level simulations -- plus it runs on Intel XEON CPU server cores.
When CDNS R&D natively compiled the RocketSim source C together with their
Incisive source C into one GNU C++ object called "Xcelium", they saw:
design type size # of CPU speed vs. Incisive
----------- ---- --------- -------- ------------------
Little Boy RTL 50M gates 8 cores 4X speed-up
Fat Man RTL 400M gates 6 cores 9.3X speed-up
Fat Man gates 400M gates 6 cores 30X speed-up
SNPS ON THE DEFENSIVE: Keep in mind that all of this is actually a tech war
between Aart and Anirudh over the next big boosts in Verilog simulation;
and this Synopsys Cheetah VCS is Aart playing catch-up with RocketSim. Even
according to the Synopsys press release, and in Aart's own SNUG'16 keynote,
SNPS Cheetah is still 2 years out (hints of ICC2's lateness?) -- and it's
still only using Nvidia GPUs instead of the Intel x86 CPUs.
"I couldn't find anyone at the Synopsys booth to discuss their
Cheetah VCS equivalent, but that didn't surprise me because
it's still 2 years out."
- Cliff Cummings, father of SystemVerilog (ESNUG 561 #7)
"We plan to roll out Cheetah technology over the next two years
as part of VCS."
- Manoj Gandhi, Synopsys EVP/GM (press release 03/24/2016)
QUESTION ASKED:
Q: "What were the 3 or 4 most INTERESTING specific EDA tools
you've seen this year? WHY did they interest you?"
---- ---- ---- ---- ---- ---- ----
CADENCE ROCKETICK ROCKETSIM
We did an evaluation of (Cadence) RocketSim, multicore CPU-based
simulation accelerator for RTL simulation.
We evaluated it on a 40M gate sub-system with Cadence NC-Sim and
SystemVerilog (we have a mix of Verilog and SystemVerilog RTL).
What we found.
1. RTL Speed up. This is the relative speed up we saw:
- For 4 CPUs, our speed up was ~ 3X faster
- For 16 CPUs, our speed up was 10X faster
RocketSim seems to peak at about 10x speed up; that's the nature
of programming.
2. Compilation time is about the same.
3. The debug was very similar to using NC-Sim; we used it with
Cadence's SHM waveforms.
4. RocketSim supports four-state 1/0/X/U logic.
5. Like NC-Sim, RocketSim communicates interactively with the
testbench. This is better than a lot of older generation
simulation accelerators that you had to use in batch mode.
A negative is RocketSim is price more than directly proportional to its
speed up. For example, it costs 6X the price for a 3X speed up, so
financially we could just get 3 more NC-Sim licenses to run on our
servers instead.
If RocketSim were priced proportionally, we could use it for all of our
regressions, rather than as a point tool for debugging. And it cannot
save a snapshot, which matters a lot for debugging. i.e. You can't run
RocketSim for an hour, save it and then run it many times, as you can
with NC-Sim alone. Cadence says the snapshot feature is in their
pipeline and they are working on it, but it is not yet available.
Technology-wise, overall RocketSim is a great tool.
---- ---- ---- ---- ---- ---- ----
Cadence RocketSim
More speed without losing the convenience and capability of a
simulator. To quote the Cadence marketing: "debuggability,
seamless testbench integration, fast turn-around, and
availability".
Emulators/accelerators are all very well but there are never
enough seats and it's always extra effort.
---- ---- ---- ---- ---- ---- ----
Cadence RocketSim enables true parallel processing to greatly speed
up logic simulation.
We have been asking Cadence to implement true parallel processing
for many years.
They finally decided to obtain it thru acquisition.
---- ---- ---- ---- ---- ---- ----
RocketSim
Cadence discussed 10X Verilog simulation performance, reduced the
memory footprint, and full debug visibility.
Given that all of support is now in place, RocketSim's high capacity
might be most significant.
My confidence in this success starts with Anirudh Devgan. Anirudh
made significant contributions at IBM in transistor level analysis,
which are still in use at IBM 10+ years after he left. At Magma,
Anirudh founded a small team with a SPICE engine which he enhanced to
become the industry's best SPICE tool, FineSim (acquired by Synopsys).
So with Anirudh involved I predict great success for RocketSim. Before
buying it, he would have confirmed the features and user experience,
well as well the inherent capabilities of the development team before
making the purchase.
It looks much easier than starting with a small team of SPICE developers
10 years ago.
---- ---- ---- ---- ---- ---- ----
I had heard about RocketSim before DAC. We wanted a simulation
accelerator, but at the time we looked it didn't support VHDL, so
couldn't use it.
---- ---- ---- ---- ---- ---- ----
I liked Cadence RocketSim. I am lazy. I cut & paste for my report.
To cut & paste Cadence marketing:
- RocketSim solves the simulator's bottleneck challenge by
offloading most time-consuming calculations to an ultra-fast
multithreaded engine. Unlike hardware based accelerators,
RocketSim works from within the familiar simulator environment
and runs alongside the existing test bench, eliminating ramp-up
time while providing 4-state bit-precise results.
To cut & paste Cooley:
- Splits Verilog simulation into multi-threads on 100's of regular
multicore Intel x86 XEON servers. What they got benchmarked 23X
faster vs. Incisive. Does gate and RTL sims. Compiles 1 billion
gates in 2 hours. 4-state-logic for X. Full System Verilog and
accelerates SVAs
- Xcelium on 8 core Linux box ran 4X faster than Incisive on a
single core Linux machine. For a 400 million gate design
(Fat Man), Xcelium on 6 cores ran 9.3X faster. That is, the
larger the design with the most activity the testbench stimulus,
the better speed-up Xcelium got! When 400 M gate Fat Man was
doing high activity DFT gate-level simulation it was 30X faster.
This 4x-9.3X-30X boost revitializes the RTL SW market (or at
least Incisive's share of it.)
To cut & paste Cliff Cummings:
- Limitations? No SDF backannotated timing yet (working on it).
RocketSim runs RTL simulations with non-accelerated UVM in
parallel.
Currently, RocketSim does not accelerate testbench primitives.
---- ---- ---- ---- ---- ---- ----
RocketSim not supporting SDF is a deal breaker for us.
---- ---- ---- ---- ---- ---- ----
Cadence RocketSim
We were already evaluating Rocketick for potential time savings
on our gate-level netlist verification.
We had been waiting for it to have SDF annotation support, which
Cadence announced at DAC, following the acquisition.
---- ---- ---- ---- ---- ---- ----
I had a very positive impression of Rocketick.
Am looking forward to the improvements in the future.
---- ---- ---- ---- ---- ---- ----
The CDNS-Rocketick acquisition looked very interesting indeed. I was
aware of the Rocketick technology since I have friends connected to
the company in Israel. I had been tracking it for some time. So when
I heard of the acquisition I was most interested. I hear that CDNS is
planning to integrate Rocketick with Incisive further I heard from a
source that they are calling it "Project Xcellium". (* - strange name
but if it produces what is possible then who cares!!)
- Xcellium is supposed to deliver 10X speed-up for RTL sims and
be massively parallel.
- For gates, I believe they said it would be up to 30X speed-up.
- Being massively parallel is something CDNS has done before with
other technologies like STA and P&R.
- They've figured out how to apply the same architecture to RTL
simulation.
One thing that did surprised me is that CDNS acquired this technology
instead of developing something in-house, like they've done with
their other digital P&R products. So despite the excitement of the
potential, I wonder if CDNS will be successful integrating these
technologies or will it follow the failed path previous acquisitions
have gone. Historically CDNS has struggled with integrating acquired
companies and their technology. It has instead replaced their old
R&D team with the new R&D team.
---- ---- ---- ---- ---- ---- ----
I've known the Rocketick guys for a while. It was a smart move for
Cadence to acquire them.
If Cadence fully integrates RocketSim to accelerate their functional
Verilog simulation by 10X, that will be great. Also the fact that
RocketSim supports Verilog, VHDL, System Verilog, OVM, VMM, and UVM
can be a market killer against Aart's VCS and Wally's Questa tools.
---- ---- ---- ---- ---- ---- ----
RocketSim
I saw Cadence RocketSim at DAC. My impression is that it's fast and
for Incisive users it's easy to use.
Our front end group did an evaluations and had a positive opinion of
it. (I haven't personally used it.)
---- ---- ---- ---- ---- ---- ----
My Synopsys VCS account manager frowned at me at DAC. He saw me
talking to Uri in the Cadence RocketSim demo booth.
---- ---- ---- ---- ---- ---- ----
Cadence Rocketick -
I don't think it'll replace our Palladium sessions, but it could
help with our early functional RTL development.
---- ---- ---- ---- ---- ---- ----
We want to get some Rocketick licenses in our tool mix next year.
---- ---- ---- ---- ---- ---- ----
Cadence RocketSim will grow with the money & resources that Lip-bu
can throw against it.
---- ---- ---- ---- ---- ---- ----
RocketSim. Faster is always gooder.
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
SYNOPSYS CHEETAH VCS
Cheetah VCS will be interesting once it comes out.
---- ---- ---- ---- ---- ---- ----
Synopsys Cheetah VCS
Saw it at SNUG San Jose. Looks like early Rocketick.
---- ---- ---- ---- ---- ---- ----
We want to beta Cheetah if we can.
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
---- ---- ---- ---- ---- ---- ----
IC MANAGE PEERCACHE
IC Manage PeerCache
My team supports about 400 mask and schematic designers in our company.
These designers vnc into servers to do their work. The data that they
access is located on NFS filers that are shared by up to 80 other users
in the company.
When rogue users get carried away and run a very large number of
simulations and regressions they can kill the performance of these
NFS filers.
- It can take hours for our IT department to track down the
offending users.
- The poor filer performance results in poor end-user experience.
- At those times our users notice long delays to bring up their
layouts or run their verification and simulation jobs.
IC Manage PeerCache is a software solution that may be able to help us
deal with this problem.
- PeerCache will cache frequently accessed data from the
NFS filers on an SSD drive that's local to a machine.
- PeerCache works on both managed and unmanaged data.
We think this software could reduce the latency of getting data to and
from disk and could help the productivity of our mask and schematic
designers.
---- ---- ---- ---- ---- ---- ----
The most interesting and useful to me is IC Manage PeerCache's
super-fast data copying system.
For most designers runtime is a huge deal and the speedups that
could be achieved using IC Manage seemed significant.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache
PeerCache's peer-to-peer networking tool was interesting and could
give us a significant speedup.
The tool has a lot of potential, especially for the design verification
space when running a lot of simulations on a larger server farm.
- When doing design verification, we do lots of simulations on one
design, and may run 1000's of SystemVerilog simulations.
- Since they use the same design database, all our servers are
pulling the same file sets from the same db.
PeerCache would let us share files faster -- we could get the data
quicker from a peer-to-peer network, versus everything hitting our
NFS filer at once.
---- ---- ---- ---- ---- ---- ----
The IC Manage PeerCache "peer-to-peer" tool was interesting.
- We like the idea that Peercache can speed up our systems and
reduce the load on the filer servers.
- It keeps engineers from waiting for copies and for other
engineers to be done.
- Plus we get the speed up without having to upgrade our servers.
PeerCache accelerates NetApp, Isilon, VMware -- we primarily use
NetApp, so that's a benefit for us.
The fact that it is all software is also plus as we expect this will
reduce our costs.
---- ---- ---- ---- ---- ---- ----
IC Manage's Global Design Platform & PeerCache are impressive.
PeerCache has peer-to-peer networking and virtual workspaces for
parallel workflows with local caching and low storage.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache
IC Manage announced PeerCache at DAC for parallel workflows. IC Manage
uses a P2P network to make it more efficient. This is a big benefit for
us, as we have multiple sites.
- The speed improvement of being able to populate databases or
files faster would be huge for us.
- We integrate files and libraries from project to project, for
new projects, and for revisions.
PeerCache should let us populate workspaces in perhaps only minutes
compared with hours for large databases.
PeerCache also massively reduces local storage needs. A company with
100's of users would benefit tremendously from that cost savings on
storage; however, it is less needed for us, given we have fewer
engineers.
Instead, we really need speed for our remote center usage.
PeerCache also offers good data security, because much of the data is
virtual vs. local.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache peer-to-peer workflow acceleration is very
interesting to us as we look to both accelerate key workflows while also
offloading expensive shared storage resources that are in high demand.
I have questions about the loading it might place on the compute nodes
that are already used for simulation and verification via LSF jobs that
now will also become storage/IO peers for the rest of the compute farm.
If the load does impact the simulations and verifications a bit, the
overall gain from PeerCache could still make sense if aggregately the
flows complete in a shorter amount of time while also offloading IO from
the shared filers.
To deploy IC Manage PeerCache, we would potentially have to change the
local storage we currently have on compute nodes (smaller spinning disk
and lower end RAID cards) and move to significant local SSD/flash
capacities to host the Virtual Workspaces and participate in the
peering.
This of course comes at increased costs, which we need to compare with
the potential benefits of reduced verification and simulation times
coupled with reduced load on the share filers.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache - peer-to-peer parallel workflows.
I've been a customer of IC Manage for many years.
Their new PeerCache seemed useful for the offices with multiple sites,
but we are a single office, with everyone on site.
I want to investigate to see how data is checked in into the filer and
how it's validated inside the cache.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache. It would be helpful for
- Large projects facing storage and compute limitations
during design
- Some areas of verification
Our own projects are currently too small to justify using it.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache. PeerCache reduces the load on the file server
through a P2P network.
It's interesting and may have possible applications, but would
need a lot of experimentation.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache: software solution to caching work areas.
1. Uses the disk on the client machine as cache.
2. Only loads metadata.
- As users read data, it downloads to the client, then all clients
on that host have instant cache access to that data.
- The cache then maintains the differences between the data.
So it only takes the populate time hit once, and only for
the files needed, and then only local disk access delay.
3. It is a "bring-your-own-hardware" software solution that accelerates
NetApp.
4. Speed Improvement. It speeds up all your project data - both managed
and generated files - and speeds up all DM systems.
5. Gives you filer storage savings
6. Allows parallel workflows.
- Engineers no longer have to wait for someone else to finish
with their physical copy.
- Any user can now clone any authorized workspace at any moment
in time -- even a terabyte-sized full chip workspace. The
clones occur in near-zero time, and include both managed
and generated data.
- No additional storage is consumed until changes are made.
We use Subversion, and IC Manage decided not to have PeerCache support
Subversion until 2017 so it's no longer a solution for us.
---- ---- ---- ---- ---- ---- ----
IC Manage PeerCache P2P workflow is very interesting.
It addresses a need we are currently facing, and that is IO bottlenecks
on large overloaded NetApp filers. We are seeing slower design data
access times and building workspaces can be time consuming.
It's interesting to me how widespread this problem is.
There is a need for meta-data to be maintained and accessible away from
the filer itself so that finds, snapshots, disk usage checks, time
stamp checks, etc. can all be done without filer access.
IC Manage's PeerCache solution is a clever way to speed up data
access when you have multiple people on a project. Their solution does
this using software.
IC Manage helps users who build workspaces. (e.g. DDM needs)
EMC pushed their hardware that is more scalable than NetApp to avoid
the IO bottlenecks. Ellexus pushed their software for identifying IO
bottlenecks and providing IO throttling and load balancing to help
ease the problem. Methodics provides a hardware solution. IBM spoke
about their new object storage solutions.
---- ---- ---- ---- ---- ---- ----
IC Manage had a great car at their booth.
---- ---- ---- ---- ---- ---- ----
Related Articles
Real Intent and Blue Pearl get #2 overall for Best EDA of 2016
IC Manage PeerCache and CDNS Rocketick get #3 as Best of 2016
MENT Calypto Catapult single handedly gets #4 Best EDA of 2016
BDA, Solido, MunEDA, and Silvaco get #5 for Best EDA of 2016
Join
Index
Next->Item
|
|