( ESNUG 430 Item 7 ) --------------------------------------------- [06/16/04]
From: Jeff Winston <jwinston=user domain=engim got calm>
Subject: Our DC, Magma, Hercules, Star-RC, PrimeTime, Artisan, IBM Tape-out
Hi, John,
We recently taped out a 2 M gate (1.2 M instance), 0.18 um 180 MHz design,
using COT flow, Artisan Libraries, IBM foundry, and Design Compiler, Magma,
Hercules & PrimeTime. The back-end flow was an unique experience (aren't
they all?), and it produced some lessons for us which might interest the
ESNUG community:
1) Putting a COT backend flow in place is a lot of work. The toughest
decision was choosing between an ASIC model, hiring out the back-end
turnkey to a services-only vendor, or being the "general contractor"
and hiring out the back-end piecemeal. The latter can be the most
challenging as you have to assemble the libraries, IP, tools and people
a la carte. Still, this was the route we took. On one hand, I'm not
sure we saved much money compared to the turnkey-services-only model.
On the other hand, our choice gave us a lot more visibility and control
of the process since much of the work happened on our site. One caveat:
If you choose a back-end turnkey vendor, make sure they will sell you
only services without requiring you to buy chips from them. We received
some "COT" quotes that were really an ASIC-like model. We were also
surprised to find some fabs required an ASIC-like model using specified
design houses.
Choosing our model, and then choosing the fab, library, tools, and
contractors took about 3 months. Make no assumptions about what library
elements are available for each process. The offerings vary by vendor,
and by process and fab within a vendor's offerings. If they offer a fast
memory in one library, don't assume it's fast in another. Be especially
wary of any IP for which you are the first user, and make sure you have
access to all the information you need for proper I/O selection and
power estimation. (For example, using the information supplied by our
foundry vendor, it was surprisingly difficult to determine the correct
number and location of core supply pads.)
2) Linux rules. Using multi-threaded 32-bit Linux boxes, we were able to
synthesize overnight, place and route in a couple of days, LVS/DRC
overnight, and time the chip in about 2 hours. Awesome! Unfortunately,
we did this project at the cusp between 32 & 64 bit Linux, and had no
access to 64-bit Linux HW or SW. As a result, the 3+ GB physical memory
limit on the Linux boxes left caused us to leave a few big-memory flow
steps on our slow Sparcs. For our next project, we're hoping to use
Opterons to move the remaining steps (GDS export, gate-level simulation
with waves generation, certain DRC steps, etc.) onto 8 GB 64b Linux
boxes. They're cheap, they're fast, they work.
3) Magma worked well for us. For this project, we took a design that
easily made timing in 0.13 um and ported it to a 0.18um, 33% slower
process. The tool required us to do a lot of up-front work getting our
technology file and timing constraints very correct and complete, but we
did not have to work very hard on synthesis, and there was virtually no
iterating in either the synthesis/placement loop, or the PrimeTime/ECO
loop. We essentially just gave the design to the placement tool and let
it do all the iterating (e.g., optimizing, buffering, cloning,
re-synthesizing, hold-fixing, etc) for us, and timing closure was not a
project bottleneck. Magma even did hold-fixing on the scan chains
automatically. Magma as a company still has some growing pains, and the
tools lacks some of the polish and robustness of its competition, but
where it mattered, the tool did the job. A few highlights:
- We chose to thoroughly re-work the somewhat basic process-specific
technology file supplied by Magma. Though it took a week, the final
Magma results correlated favorably to Star-RC (Magma was slightly more
pessimistic), and were only a tiny bit more optimistic than PrimeTime.
We avoided the need for power-analysis tools by using 0.18 um and
over-designing our power-grid.
- Before giving the design to Magma, we synthesized in DC to about 15%
overspeed without wireloads, and modified RTL as needed to remove all
the obvious bottlenecks. This helped a lot, and the only "surprises"
in our first place and route were caused by long paths to and from
some RAMs. After a round or two of floorplanning and a few RTL
tweaks, our flow was able to close timing by itself without undue
incident. (We actually did this a few times, but only because of
design changes.)
- Magma does its own fast (cell-to-cell) DRC checking. This allowed us
to quickly iterate on diode fixes and such without ever having to
power-up Hercules.
4) One very powerful thing about Magma is that you can do a lot of behind-
the-scenes tweaking via its .TCL interface. On our project, we used
Magma TCL for a number of backstage tasks:
a) Generating the pad ring. To parallelize our efforts, no Magma tools
were used for pad ring creation. A Perl script was written that
generated .TCL scripts. We then just ran the .TCL script in Magma
and a complete pad ring appeared on the screen. This also made
subsequent pad-ring tweaking much easier, and made it trivial to
generate the bonding diagram information, landing pad maps, etc.
b) Modifying the layout: For example, at one point we realized we needed
to make a hierarchical block smaller (due to design changes). We
estimated 3 days to redo the block's power grid manually. However,
using some clever Perl, we were able to generate a .TCL script to
make most of the changes, saving us about 2 days. We often wrote
similar scripts to do small repetitive changes painlessly.
c) Generating ECOs: We had to do some small ECOs to close hold timing
against PrimeTime, and some larger ones to fix some hold problems
created by an error in the setup of the scan-insertion tool. Unlike
Apollo's ability to slurp in an ECO'ed netlist, Magma requires ECO
scripts with attach/detach/replace specifications (similar to IBM's
B-scripts). We wrote a Perl script to generate ECOs from either
PrimeTime reports (for hold fixes) or from a simple command language
(for everything else, including diode insertion). This made the ECO
insertion process much easier. Our script for this is available at
our http://www.kwcpa.com/tools link.
Magma's AEs are quite knowledgeable and helpful, but we did find a few
real bugs. Fortunately we worked around most of them, as it could take
a while to get a fix. Finally, if you go Magma, make sure you get all
the types of licenses you need, and be sure to get at least a few base
licenses so you can run multiple jobs (for exploration) and interactive
sessions.
5) Don't underestimate the work required getting your timing constraints
right. We needed to thoroughly and correctly specify timing constraints
for three different tools: Design Compiler, Magma, and PrimeTime. Our
final .SDC's were very big files, full of cross-clock domain signals.
The first challenge was identifying multi-cycle paths (MCPs), especially
cross-clock-domain paths that had to be marked as multi-cycle. We
eventually wrote some special DC and Perl scripts to identify all our
cross-domain paths, and as a result we found a few we didn't know about.
Setting the timing constraints for the cross-domain MCP's is a tricky
process, so make sure you read the manuals and Solvnet. Finally, it
took some extra thought to get the clock specifications right for each
of our different modes of operation (between functional and multiple
test modes there were 6 different ways to clock & operate the device).
6) Don't run Star-RC as an after-thought. For cost reasons we limited our
use of Star-RC to late in the project, but there were a few tweaks we
should have made to the front-end of the flow to get Star-RC to work
properly. If we had been using Star-RC all along we would have made
these changes at each step so that Star-RC would have run properly on
our final design. We ended up using Star-RC only at the block level as
a correlation check. (Fortunately the correlation was quite good).
7) Expect new releases of your libraries, DRC decks, and such, especially
late in the process. It's easy to say "We're not going to upgrade the
tools until after the project". But when your library or COT vendor
issues a "mandatory update", there's nothing you can do but suck it up.
You would think that the collateral for a 0.18 um process would be
stable by now, but as chips get into production, vendors find new
problems. By the way, use the same DRC tool and deck as your foundry
vendor and their checking may go more smoothly. This is why we chose
Hercules, which took some effort to set up but performed quite well
(and ran overnight on a Linux box!).
Other random thoughts: For simulation we used Cadence NC-Verilog and had no
problems, except that its tool for merging coverage results was rather
primitive, so we wrote our own (again see http://www.kwcpa.com/tools link)
though I believe Cadence is replacing their coverage tool in new releases.
Also, because Magma makes aggressive use of negative hold, we had trouble
functionally simulating our scan chains until we realized Artisan had also
supplied a separate "negative-hold-time" simulation library.
We originally assumed we would buy a formal tool. We surveyed the market
last summer and had selected Mentor's FormalPro. Though it was neither
the most robust or polished tool of the ones we saw, we felt that, at
the time, it had the best solver, and that was what counted to us the
most. However, after we started using Magma we realized that we would
not be doing much iteration in the ECO/PrimeTime loop to close timing,
and that we would have little wall time between the last placement and
tapeout, so the number of final functionality-change ECOs should be
small (we ended up with just one). We also decided that we could keep
Magma, MentorDFT, and DC "honest" by extensive gate-level simulation of
the post-route netlist (again, thank heaven for cheap 3GHz Linux boxes),
so we never actually found a compelling reason to purchase a Formal tool
before tapeout. On the other hand, we have not ruled out acquiring one
if we ever need to do metal ECOs.
We used a good linter (HDLLint), and it helped us find some issues that
might have otherwise only shown up via simulation. We farmed out DFT
(MBIST, JTAG, Scan) to a local Mentor-based test house (Plexus Design
Solutions), and this engagement went well. Although it took a little
time to connect with the right people at Artisan, their AEs are quite
knowledgeable and using their libraries was a positive experience. We
also received excellent support from Synopsys on Hercules and Star-RC, and
from IBM on foundry issues. Actually, without the great support we
received from all our vendors, I think the process could have been much
more painful.
Delays due to design changes notwithstanding, it took about 5 months to
go from the first back-end steps through to tapeout, using one in-house,
very senior physical designer from Edgerate Consulting.
Though chip design is alive and well in big corporations, I was happy to
find that it is still doable in a small-company, limited-budget
environment. Our parts came back in April, and appear to work quite
well. Look for our awesome product rollout this summer!
- Jeff Winston
Engim, Inc. Acton, MA
|
|