( ESNUG 441 Item 8 ) -------------------------------------------- [03/09/05]
Subject: ( ESNUG 431 #10 ) Two Users Using Massive Palladium Installations
> Overall, our experience with Palladium has been very good, although not
> perfect -- I would say 9 out of 10. I would recommend it and it is
> definitely worth the dollars, as NRE's and re-spins are getting very very
> expensive for ASICs.
>
> - Tom Paulson
> QLogic Corporation Eden Prairie, MN
From: Narendra Konda <nkonda=user domain=nvidia spot calm>
Hi John,
About 3 1/2 years ago when our designs at nVidia hit 5+ million gate sizes,
we started hitting a number of problems with the FPGA-based emulators we
were using at that time. Problems were capacity, compile tools, debug
tools, etc. It was at this time that we started to search for a better
emulator. Then last year, after an intense eval, we decided to use
Palladium. Here's why:
- Traditionally the size of our GPUs increase by at least 50% over the
previous generation of GPUs. At times we had to resort to hooking up
multiple emulators together to get the capacity we wanted. We decided
to use Palladium since it could handle our largest design as well as
meet our future GPUs requirements as well.
- The compile time it takes to port RTL/gates into FPGA based emulators
grew from few hours to overnight compiles.
This prompted us to look at processor based emulators since the compile
times in this technology do not increase linearly with design size. A
design that used to take few hours to compile in FPGA based emulators,
we are now able to compile the same design in few minutes in Palladium.
- In FPGA based emulators one has guide the compiler in identifying the
clock tree by declaring different types of clock constraints. This
requires browsing the design. Isn't it fun to browse a 20+ million
gate design? NOT.
In Palladium it is enough to declare clock sources and the compiler
takes care of the rest.
- We found that the time it takes to capture waveforms are by far faster
and the number of data samples that can be captured are much larger on
Palladium compared to other emulation tools.
- A unique feature of Palladium is that multiple users can use the same
system concurrently.
Now on to the negetives:
- Palladium is huge in size compared to other emulators. It needs lot of
real estate, enormous amount of power, needs lot of AC. It is a high
maintenance system. Our facilities team is already complaining that
I am draining too much power, too much A/C, too much of everything.
Quickturn/Cadence needs to solve these issues.
- The second negative is the cost. This is not in particular to
Palladium, in general emulators are expensive and cost multi million $.
We can only afford to buy 1 or 2 emulators to debug our design.
In general, without emulation, there is no way we can bring out a GPU.
However, I believe FPGA-based architectures have run out of steam to handle
multi-million gate designs effectively. Palladium's differentiating
element is that it is a processor-based technology.
All in all weighing both positives and negetives, we are very happy with
Palladium and I would recommend it to other users.
- Narendra Konda
nVidia San Jose, CA
---- ---- ---- ---- ---- ---- ----
From: Joerg Kayser <jkayser=user domain=de.ibm spot calm>
Hi, John,
We started with CoBalt and went over the years to Palladium Grande. In the
1997-98 timeframe, we were seeing simulation acceleration speeds of 20-40K
cycles/second, but now we are getting 100-500K cycles/sec depending on the
model. In-Circuit Emulation was not our most urgent need back then so we
did not evaluate that speed.
Evaluating capacity can be confusing because of the different ways gates
are defined by vendors and users. Palladium can evaluate gates with 4
inputs and 1 output, and we use these 4-way gates to calculate the model
size. Many vendors show numbers based on 2-way NAND equivalents, where the
numbers are about 2-3x higher. Within the last 5 years, our models grew
from 10 M to about 40 M 4-way gates (~100 M 2-way ASIC gates).
We use Palladium mostly for acceleration of hardware and firmware co-sim
for our very big system models (plus we're doing logic analysis, debugging
and some regression testing with it.) Using VHDL, we load our mainframe
hardware model on Palladium and run software against it using the
transaction processing. Our main focus is on raising the code quality to
the point where it will run correctly on the hardware, as soon as the
hardware arrives. However with Palladium, we have also discovered a number
of problems in our hardware designs. We need to better understand how we
can apply Palladium to other tasks than hardware and software
co-verification. We are looking at in-circuit emulation as an area to
expand our use of Palladium.
Our Palladium system consists of 16 boards with 266,000 processors, 64 GB
of distributed memory, 740 high speed cables for board interconnects, low
latency DAS connections to the workstation and other hardware to connect
the model to the outside world. The workstation is a RS/6000 with 64 GB
main memory and does the model compiles as well as runtime applications.
We use the Palladium from 4 different sites around the world and benefit
from the different time zones. The machine can be used around the clock,
24 by 7, and is often used more than 500 hours per month. If a model is
small and does not need all HW resources, it can be compiled on a subset
of the boards, and several models can run independent of each other and in
parallel. We can queue up the simulation jobs and run them unattended.
The Palladium compiler provided reliable results building models of 5 M to
100 M 2-way equivalent gates. An FPGA-based accelerator might be limited
to 50% available capacity, whereas Palladium, which is processor-based and
has enough connectivity, is stable at more than 80%.
We also liked Palladium's debugging, especially its tracing capability. In
many other environments with tracing turned on, simulations will slow down
in proportion to the number of probe points because of the way their tracing
buffers or memory are implemented. Fortunately, this is not the case with
Palladium. Probe points can be added dynamically for debugging with no loss
of speed.
We have worked closely with Cadence to expand our version of Palladium to
run our large models, and a lot of efforts has been spent to by our two
companies to meet IBM's needs, especially in the area of capacity. Other
companies that don't have mainframe-sized designs will have lower demands
for capacity and therefore may just need a smaller system. This is
possible, as the granularity allows any number of boards or even parts of a
board. Palladium is a large investment and depending on what you want
to do with it and what your goals are it may or may not be worth the money.
Companies must evaluate this decision based on their own designs and the
weaknesses or demands of those designs that they need to address.
For us, it was speed and capacity so it was appropriate that we spend the
time and money for our very large models for this kind of testing.
Using our Palladium installation, it's hard to give a quantitative answer
or guess what the millions of dollars of savings are for the problems that
we avoided and the damage that did not occur. We feel that we've saved
ourselves a lot of trouble and that the expense has been worth it and that
the improved time to market has justified the cost.
- Joerg Kayser
IBM Deutschland Entwicklung GmbH Boeblingen, Germany
Index
Next->Item
|
|