( ESNUG 473 Item 1 ) -------------------------------------------- [05/29/08]
Subject: ( ESNUG 470 #1 ) We taped out 3 chips, all blocks done in Atoptech
> At 65/45 nm, what's the most important in routing are the DFM/SI issues
> plus the real life pain spent fixing on-chip variation. This is where
> Atoptech falls apart. They don't even have tape-outs yet. Quick P&R of
> fast blocks that you can't cheaply manufacture, you can't test, and which
> consume a lot of power is a complete waste of everyone's engineering time
> and dollars. In a point tool flow Atoptech might find a niche, but for
> small COT houses with limited tool budget, Atoptech and Mentor/Sierra
> won't be major players until they can bring a real, full production
> proven flow to the party.
>
> - Jonathan Bahl
> COT Consulting, Inc. Toronto, Canada
From: [ Iron Man ]
Hi John,
I must be anon on this.
Recently my group migrated our P&R flow to Atoptech. To give you a little
background, our team has been working in 65 nm for about 2 years. We have
a fairly wide range of chips and IP ranging from high speed (500+ Mhz)
semi-custom DSPs and CPUs to SOCs with a wide range of IP from small, high
speed blocks (100k inst, 800+ Mhz) to larger, slower speed (700k instance,
200+ Mhz) cores. Our P&R flow is netlist->GDS, and our focus was in this
area rather than a full suite from RTL to final verification. Our goal in
working with Atoptech was to find a tool which could give us correct
results on a standard flow on this wide range of IP, with an emphasis on
turn time, timing/SI closure and routing closure. Additionally, the tool
needs to have the ability for the user to tweak the standard flow for those
designs which need much higher effort; however such tweaking should only be
reserved for non-standard blocks.
Atop set-up:
From a users standpoint, adoption was pretty straightforward. We had our
initial Atop flow up and running within a month. Within 2 months we had a
flow written, used by 30+ engineers at 6 sites and were pushing all of our
IP blocks through. Within 4 months from starting, we had taped out 3 large
SOCs with all blocks being done in Atop (the first just came back and is
working in the lab).
From a setup standpoint, Atop reads in a proprietary techfile for routing
and extraction which their AE's helped us prepare by converting existing
data. From a design data standpoint, it reads LEF and LIB format for the
P&R views of the macro's and standard cells so there was no data prep
required there. (Our current .lib files are NLDM based but we are working
with Atoptech to bring up CCS both for noise and delay to help in
correlation to signoff.) Our designs were brought in via Verilog gate-
level netlists output from Design Compiler. Floorplan information (PG,
pins, block size, etc) can be entered in a native TCL format or exported
from a 3rd party tool via DEF.
Atop is TCL based, as we'd expect. Initially we were a little frightened by
what looked like a short list of meta commands (place_opt, cts_opt,
route_opt, etc.) to run the flow. In the past, we've had issues before with
having nowhere to go once high level commands run out of steam. However, in
Atop these commands do seem to be able to cope with most of our current
designs and there is a decent enough set of lower-level commands (so you can
manually place cells, target limit fixing only, rip-up-reroute specific
wires, etc.) that we've used successfully to tackle some of our trickier
corner cases. The Atop database still needs some work as not all parameters
are queryable/controllable through the TCL interface but this is rapidly
improving through user feedback. This sort of control is essential to us as
we know no tool is going to do a 100% job so we know we are going to need
complete access to the db to program workarounds rather than wait for the
vendor to get us a new binary.
Timing Correlation:
Atop is targeted at correlating with PT-SI or PT-CeltIC flows, the user
tells Atop which one with a switch. My data is for PT-CeltIC, as that is
what we use, but other groups internally have been equally successful using
PT-SI. We expect Atop SI correlation to improve once we switch to CCS in
the next few months here.
We wanted to make sure Atop doesn't underfix (and thus require ECOs) or
overfix (and thus burn excess power/area). As a result, we focused on
endpoint slack for the path (versus stage delay). Additionally, we only
looked at critical and near critical paths, where these correlation errors
would result in either underfixing or overfixing.
Correlation to PrimeTime:
To test correlation to PrimeTime, we took a design and timed it in both
Atop and PrimeTime, with parasitics backannotated from Star-RCXT and with
SI turned off, in 3 different corners. The results were as follows:
Corner A Corner B Corner C
Max Neg Error -2.4% / -77ps -1.6% / 53ps -6.2% / -46ps
Max Pos Error 1.7% / 80ps 1.3% / 61ps 6.8% / 55ps
Mean Error 0.22% / 13ps 0.12% / 7ps 0.37% / 4ps
Median Error 0.10% / 6ps 0.1% / 5ps 0% / 0ps
Standard Dev 0.35% / 19ps 0.26% / 13ps 1.39% / 12ps
Overall correlation of the Atop timer to PrimeTime was very good for an
implementation mode tool.
Correlation to Star-RCXT (lumped cap):
To test correlation to Star-RCXT, we looked at caps on all nets in both Atop
and Star-RCXT. We bin the caps so we can view systematic errors in
short/long net.
All CapT CapT CapT CapT CapT CapT
CapT >200ff >100ff >50ff >20ff >10ff >5ff
Net count 140354 1647 6988 11104 27308 32860 60447
Sum error (%) 1.52 -23.97 6.21 6 5.41 5.36 5.48
Average (%) 5.54 4.6 6.21 5.96 5.4 5.36 5.58
Std dev 4.16 14.39 2.57 2.79 3.13 3.57 4.57
Max Pos (%) 44.33 17.21 37.4 20.49 44.33 37.16 28.76
Max Neg (%) N/A N/A -4.67 -3.53 -13.71 -11.4 -11.12
For an implementation tool, probably to put things in better perspective is
to view errors in the delay domain versus the cap domain which was shown
above. So what we are showing is the error on our paths and just varying
which extractor (Atop/STAR) we use. The number on the left is the path
delay error using Atop's timer and STAR's parasitics, the number on the
right is with Atop's extractor.
Corner A Corner B Corner C
Max Neg Error -2.4% / -5.5% -1.6% / -5.3% -6.2% / -6.2%
Max Pos Error 1.7% / 0.3% 1.3% / 0% 6.8% / -8.3%
Mean Error 0.22% / -1.49% 0.12% / -1.71% 0.37% / 0.55%
Median Error 0.10% / -1.4% 0.1% / -1.7% 0% / 0.15%
Standard Dev 0.35% / 0.79% 0.26% / 0.84% 1.39% / 1.75%
As you can see, Atop's extractor does make the error worse (as expected) but
still keeps things fairly tight.
Correlation to Star-RCXT (coupling cap):
For coupling caps, we do the same experiment of timing the design in Atop
using its caps and with SPEF backannotated from Star-RCXT, except this time
we turn SI on in order to capture the effects of the coupling caps in the
delay realm.
Corner A Corner B Corner C
Max Neg Error -4.4% / -6.9% -5% / -7.5% -13.9% / -12.9%
Max Pos Error 7.9% / 6.8% 7.8% / 6.3% 9.9% / -10.4%
Mean Error 0.3% / -0.78% 0.19% / -1.13% -0.21% / 0.13%
Median Error -0.2% / -0.90% -0.1% / -1.1% -0.3% / 0%
Standard Dev 1.63% / 1.54% 1.45% / 1.59% 2.04% / 2.12%
And again, in the delay realm the extractor doesn't significantly increase
the total error bound of the delay versus using our signoff extractor. This
is something we are working with them on improving over time.
Correlation to CeltIC (delay):
To test correlation to CeltIC, we took the same design and timed it with and
without noise impact, to make sure our errors weren't impacted by adding SI.
Below the percentages represent the error without SI and with SI.
Corner A Corner B Corner C
Max Neg Error -2.4% / -4.4% -1.6% / -5% -6.2% / -13.9%
Max Pos Error 1.7% / 7.9% 1.3% / 7.8% 6.8% / -9.9%
Mean Error 0.22% / 0.30% 0.12% / 0.19% 0.37% / -0.21%
Median Error 0.10% / -0.2% 0.1% / -0.10% 0% / -0.3%
Standard Dev 0.35% / 1.63% 0.26% / 1.45% 1.39% / 2.04%
The results of this, as you can see that while good for an implementation
tool using NLDMs, SI calculation had an impact on our overall delay
accuracy. We believe a large portion of this error is due to Atop using
NLDM's for its noise models, whereas CeltIC uses a transistor level model.
We are actively working with Atoptech to bring up CCS noise modeling to
tighten this correlation on noise delay, and expect to roll that out
within the next few months.
Overall delay accuracy to signoff:
This is probably the most important number for a designer. To measure this
we time a design inside Atop using its extractor, timer, and SI engine and
compare the delays to PT/Star/CeltIC.
Corner A Corner B Corner C
Max Neg Error -6.9% (-500ps) -7.5% (-603ps) -12.9% (-87ps)
Max Pos Error 6.8% (382ps) 6.3% (370ps) 10.4% (76ps)
Mean Error -0.78% (-66ps) -1.13% (-91ps) 0.13% (2ps)
Median Error -0.9% (-58ps) -1.10% (-104ps) 0% (0 ps)
Standard Dev 1.54% (112ps) 1.59% (132ps) 2.12% (19ps)
Probably the best way to sum up those results is what it means to our design
engineers in terms of ECO reductions. What we've found with Atop is that we
expect our designs to complete block level timing with only a single minor
ECO (on the order of 10-20 gates for a 500k block) and converge in a single
pass. We are continuing to work with them on getting that number down
further, as our belief is that number should be zero.
On the overfixing side, initial results were good but there were too many
optimistic outliers. They have successfully tightened that up now but there
is still some work to do e.g. they still have problems correlating
large-cap/small-driver paths particularly well, especially once crosstalk
is brought into the equation (a default requirement of any P&R tool for us
is a crosstalk engine), but it's not significantly worse than other tools
we've seen. We're still working with them on this and do expect it to
continue to improve, especially as we transition to CCS.
Another thing they need to work on is improved leakage optimization. We use
a 3-threshold (low, medium and high) cell library and still find ourselves
able to do some hand or perl-based threshold swapping of the Atop result to
get less low-threshold usage; but again, this has improved since we started
and now their timing correlation is improving. We expect to be able to
focus on this more in the future. Atoptech has been aggressively working
with other groups in our company on leakage optimization, and we've been
told by our engineers that their leakage optimization in the latest release
is competitive.
Runtime:
Their multi-threaded timer engine is fairly fast on our 65n blocks. From a
timing standpoint, a timing report of a routed, 3 scenario, 350,000 instance
design takes about 12 mins when running in 4 thread mode (this time includes
extraction, SI delay calc, and timing analysis).
From an overall runtime, the DRC checker and some of the optimization
algorithms (CTS, placement, routing) are also multi-threaded, improving our
overall turn times. This gain is compounded by the fact that timing QOR is
very good -- so time to initial result is good, and time to clean results is
very good due to the fewer ECOs.
Atop has done a good job of meeting our primary goal of reducing the time it
takes to make 65n layouts. One ~350 K instance design takes about 48 hours,
from netlist->GDSII, running in 4-thread mode on a 2.8 Ghz Xeon, optimizing
for setup and hold across three scenarios (see MCMM section below for
description of "scenario"). This includes power insertion, mfill, yield
optimization, xtalk avoidance, DRC closure and limit fixing in addition to
basic timing optimization.
This is 48 hours in our flow, which we have configured to do a significant
amount of report generation -- so true optimization runtime is less than
this. A rough rule of thumb is a 100k instance design should take about
16 hours, a 200k instance design about 24 hours, and a 500k design about
36 hours. Obviously those numbers varying based on the difficulty
(routing/timing) of the block and the number of path exceptions the timer
encounters.
MCMM:
Atop's MCMM approach is based around N scenarios. A scenario is a
combination of a timing constraints file (a mode and a corner). It
creates N copies of the design, optimizing them and bringing the
results back together to ensure alignment at discrete points throughout
each optimization step.
It was easy to configure this approach - you just create a variable listing
the names of all your scenarios, then for each one have a TCL control file
that tells it what SDC (Atop reads SDC v1.4) to use, what OCV settings,
whether to do min/max timing or both, etc.
So for N scenarios you just need to provide N TCL files. These are easily
auto generated as their structure is pretty simple.
Because each scenario can be treated as a separate optimization job, Atop
can spawn each one off as a distributed job through LSF. So if you're
running on a 4-CPU machine and have 12 scenarios, it will run 4 of them
on the master machine as 4 separate threads and then spawn the other 8 off
as LSF jobs.
While this makes it a simple thing to setup and control it does appear to
suffer somewhat from an apparently large CPU usage when we've analyzed the
compute usage -- so we think there's still some work to do to on creating
an MCMM flow that maximizes compute efficiency.
They do have a mode to do concurrent MCMM vs. distributed, but that requires
a tradeoff on runtime.
ECO's:
One thing Atop is good at is ECOs. We've had it doing manual and netlist
comparison ECOs for the last 6 months and have found it does a great job at
implementing them sensibly. That is, it's only changing what needs to be
changed so the non-ECOed portions of the design maintain their DRC and
timing QOR.
Their ECO performance means we're looking at an MCMM approach partially
based around using an ECO to apply the other N scenarios while doing the
main body of the P&R with a smaller set.
We've also successfully used this ECO approach to improve on the problem of
inefficient multi-threshold cell selection without a hefty runtime penalty.
Their LSF distribution needs some work but this is a common problem we have
with a lot of multi-threaded, multi-compute tools. They don't have the
necessary hooks in them to play nice with the way our LSF q is configured
(e.g. you tell Atop on startup how many CPUs are allocated to it, but there
is no easy way to ask LSF to give you an N CPU machine when you have more
than one LSF slot on a multi-CPU machine).
Routing:
Atop reads standard LEF format for the P&R views of the macros and standard
cells. One drawback of their P&R is the lack of fine control over how to
treat a macro blockage (i.e. obstructions in the macro LEF's). In Atop you
can define the blockage as requiring fat spacing or min spacing, but a macro
cannot have a mixture of fat and min spacing blockage, only one or the
other, due to limitations in the LEF format. So to get acceptable routing
performance (because treating all blockage as requiring fat spacing would
be too pessimistic and blow congestion) we set it to treat all blockage as
min-spacing.
This required us to do some LEF modification or P&R workarounds (e.g. adding
routing blockage in the layout) to avoid seeing fat-wire fails in Calibre
that Atop was not seeing. This wasn't overly taxing and we expect this to
be a temporary workaround as Atop is working on their own abstraction tool
which will allow a richer description than you get from just reading LEF.
Routing results correlated very well to Calibre. In general, if our designs
are clean in Atop, we see that they will be LVS clean in Calibre (assuming
good LEF views).
From a DRC standpoint, many blocks come through completely clean in Calibre
as well. The rest are mostly DRC clean (for a 500 K instance design, under
10 violations in Calibre), the DRC issues are usually minor ones requiring
a little bit of manual cleanup e.g. spacing fails, etc and this is something
we are actively working with Atoptech to address, as we consider any DRC/LVS
failure caused by the router to be unacceptable.
Going Forward:
In the past few months, we've successfully transitioned over to Atop as the
primary P&R tool in our group. We've taped out 3 chips where all blocks
were done in Atop for netlist->GDS, and are targeting all chips (roughly
10-15 more tapeouts) going forward this year to use Atop exclusively for
block level physical design.
On the block level side, another area for them to fix is their macro placer.
We are still placing macros by hand, using design knowledge and incremental
placement in the GUI to guide us. The Atop GUI is OK for this but we are
still some way from it having an automatic macro placer.
Our current focus is to integrate Atop into our top-level assembly and
floorplanning flows. (Atop was developed as a block level netlist->GDSII
tool.) We are working with them on the extra features (pin placement,
automatic partitioning, padring routing, etc.) to make a tool suitable
for top-level.
A lot of these features are in beta already and we're pretty far down the
road of using Atoptech's tool suite for top-level as well, and hope to have
them transitioned in the second half of the year.
- [ Iron Man ]
Index
Next->Item
|
|