( ESNUG 352 Item 5 ) --------------------------------------------- [5/17/00]
Subject: Muddling From A 3-Pass To A 2-Pass Hierarchical Layout Methodology
> I'm hoping to see a more substantive discussion of hierarchical layout
> (beyond the hierarchical vs. flat discussion). The hierarchical vs. flat
> discussion is really only interesting until you hit about 2M-3M gates.
> Then flat just isn't an option, and many of the hierarchical
> methodologies are quite immature.
>
> Are you seeing much in the way of hierarchical layout methodologies?
>
> - Nancy Nettleton
> Sun Microsystems Silicon Valley, CA
From: Tom Ayers
To: Nancy Nettleton
Hi, Nancy,
I've done quite a bit of hierarchical layout, but the tool support for this
technique is spotty. Typically the process involves the following:
1) Project lead determines hierarchical breakdown of chip into
"subsystems" which are place and route blocks. Hierarchical
equivalence is maintained between RTL and layout. Subsystems have
additional requirements from modules, usually WRT reset, interrupt,
etc., resynchronization for synchronous distribution across the chip
and clock tree components if clock tree synthesis is not being used.
2) At RTL code complete, first pass floorplan is done. Blocks are 90%
accurate for cell area and margin is added for late maturing blocks.
Manual global routing (Synopsys has new tool optimized for this, but
I don't know how good it is) and megacell memory placement is done and
timing data extracted for subsystems (PrimeTime or Ambit would be best
for this since they have time budgeting). Some global buses may be
SPICEd.
3) Representative subsystem is selected for full trial P&R closure to
iron out tool flow issues.
4) Final RTL synthesis is done after validation complete and subsystem
areas compared to allow floorplan to be "jiggled" to correct sizes.
5) P&R closure for each subsystem with IPO runs.
6) Full chip parasitic extraction and static timing.
7) LVS/DRC/tapeout
This pass I expect to integrate Chip Architect to allow both forward
annotation of basic floorplan from design team to backend and to help
timing closure with synthesis to placed gates.
I would be interested to hear what you are doing over there at Sun.
- Tom Ayers
Believe.com
---- ---- ---- ---- ---- ---- ----
From: Nancy Nettleton
To: Tom Ayers
Hey Tom,
What we're doing is very, very similar. I'll explain the overall
differences and then annotate your mail with specific differences.
One question: are you in an ASIC flow, a COT flow, or are you in a kind of
a revised ASIC flow where you have an ASIC vendor doing P&R but you're
doing up-front floorplanning and cell placement to close timing?
Here's some of the differences between what you describe and what I've put
into place for our ASICs.
1. I've been emphasizing stages of the project in addition to specific
tool capabilities (since I've seen alot of projects that went
to layout either too early or too late to close). I work really
hard up-front to define what design milestones we need to meet
for given levels of layout.
2. For our last four 1M+ gate ASICs, we've had our ASIC suppliers do
all floorplanning, placement, and timing closure. I check out
the suppliers, set up the projects, and keep track of them as
they go through the pipe, but the suppliers do all the heavy
lifting in physical design.
3. We've got hierarchical flows running on both Avanti and Cadence, both
with extensive homebrew from the ASIC suppliers. We use DC for
synthesis and PrimeTime for pre- and post-layout STA, but most of our
time budgeting has been pretty manual. That's not going to work for
3M+ gate ASICs.
4. I've been working with the teams to get 3 full chip layouts done,
but I've been wanting to try to move to a 2-layout model if I can get
the suppliers flows stabilized. I'd be interested to compare notes
to see how your 2-layout model works.
Our 3-layout model works something like this:
Layout 1: 80-90% gates synthesized
All memories, hard macs, clocks, & DFT implemented
Pre-layout Timing and verification level irrelevant
Layout Goals:
. Confirm die size
. Structure clocks, top-level busses, and critical
I/O paths
. Identify any architecturally or floorplan
limited critical paths.
. Work the kinks out of the hand-off model (esp.
for timing constraints)
. Work the kinks out of the methodology
. Set timing budgets
Layout 2: All gates synthesized
Sufficient verification to assure that top-level
will not change in final layout.
Timing met in synthesis with parasitics and budgets
from layout 1.
Layout Goals:
. Layout and freeze top-level routing.
. Close timing to synthesis timing.
Layout 3: This is basically the layout that goes to mask.
Logic is fully verified.
Because the top-level is frozen (small # ECO's allowed),
this layout is just a block-level re-spin,
making it much faster than layout 2.
What I've been telling my projects is that we could cut out the first layout
if we could take another 2-4 weeks on the 2nd and 3rd layout OR if they
could leave more area/timing margin to take care of last minute problems.
So far I haven't had a project want to take the additional time or margin.
My feel is that management is willing to pay more up-front NRE for an add'l
layout in order to minimize the time it takes to get from fully verified
RTL to mask.
I also suspect that for the last 4 ASICs, we've been hitting some pretty
raw design methodologies. For all 4 ASICs, we were their first hierarchical
design project, the first project on which they had fully owned layout
timing closure, and the first project on which they had to close
cross-capacitance. It was alot of methodology to check out in just 2
layouts. I don't think I could've done it.
> 1) Project lead determines hierarchical breakdown of chip into
> "subsystems" which are place and route blocks. Hierarchical
> equivalence is maintained between RTL and layout. Subsystems have
> additional requirements from modules, usually WRT reset, interrupt,
> etc., resynchronization for synchronous distribution across the chip
> and clock tree components if clock tree synthesis is not being used.
We break out the hierarchy with our suppliers (because they look at the
physical side while Sun ASIC managers look at the logical). If we were
in a COT environment, or our design managers had more experience with
hierarchical layout, I would do what you describe above.
> 2) At RTL code complete, first pass floorplan is done. Blocks are 90%
> accurate for cell area and margin is added for late maturing blocks.
> Manual global routing (Synopsys has new tool optimized for this, but
> I don't know how good it is) and megacell memory placement is done
> and timing data extracted for subsystems (PrimeTime or Ambit would
> be best for this since they have time budgeting). Some global buses
> may be SPICEd.
This sounds like my Layout 1 (90% complete, margin added for late maturing
blocks...) We do the top-level route automatically in both our Cadence and
Avanti flows. We do full block-level floorplanning, cell placement, and
routing. I haven't tried PrimeTime for budgeting yet, but I'm going to try
on my next project. We don't spice global busses, but we do spice clocks
and some sensitive I/O paths (choice of what gets spiced is pretty design-
specific).
> 3) Representative subsystem is selected for full trial P&R closure to
> iron out tool flow issues.
Yep. Because it is so hard to identify all architecturally limited paths
in synthesis, I've been doing all blocks. It also helps us characterize
insertion delay on un-PLL'ed clocks.
> 4) Final RTL synthesis is done after validation complete and subsystem
> areas compared to allow floorplan to be "jiggled" to correct sizes.
One big problem we've been encountering in the multi-million gate ASICs is
that the typical 20-50% error in estimating gate count has just killed us.
We've had netlists grow by hundreds of thousands (and once even a million)
gates from release to release. For this reason, we've started tracking
gate counts in between layout netlist releases, so that if something really
starts to grow, we can respond with an early floorplanning netlist before
the real netlist release.
It probably starts sounding a bit maniacal, but once that netlist is handed
off to layout, our management starts the clock ticking like its a death
march. Anything we can do to pull time out of that netlist-to-closure
time, we really need to do it. We really emphasize early identification
of issues for that reason.
> 5) P&R closure for each subsystem with IPO runs.
> 6) Full chip parasitic extraction and static timing.
> 7) LVS/DRC/tapeout
That's my Layout 3.
> This pass I expect to integrate Chip Architect to allow both forward
> annotation of basic floorplan from design team to backend and to help
> timing closure with synthesis to placed gates.
This makes me think you're in a hybrid ASIC team where you're doing some
floorplanning and placement to help close your synthesis-to-layout timing.
I've wanted to explore this methodology, but most of the Sun ASIC teams
have been very resistant to building any sort of physical infrastructure/
knowledge at all. They pay for it in NRE, but they seem to prefer that
to coughing up for the tools/engineers it would take. I think we can only
get away with it because we're such a big chunk of the ASIC suppliers pie.
I'd be surprised if we can get away with it forever.
What kind of hand-off model environment are you in?
- Nancy Nettleton
Sun Microsystems Silicon Valley, CA
---- ---- ---- ---- ---- ---- ----
From: Tom Ayers
To: Nancy Nettleton
> One question: are you in an ASIC flow, a COT flow, or are you in a kind
> of a revised ASIC flow where you have an ASIC vendor doing P&R but you're
> doing up-front floorplanning and cell placement to close timing?
Hi, Nancy,
For ASIC processes, it was upfront floorplanning using either CMDE (LSI) or
Compass to do soft floorplanning and megacell placement with ASIC provider
doing power strapping, clock tree, column cutting, and P&R. For COTs
process, our design group would draw out floorplan as starting point for
the backend group and review final megacell placement and actual floorplan.
> 1. I've been emphasizing stages of the project in addition to specific
> tool capabilities (since I've seen alot of projects that went
> to layout either too early or too late to close). I work really
> hard up-front to define what design milestones we need to meet
> for given levels of layout.
Yes, I usually use a "layout checklist" which needs to be filled out before
you are ready. If you can fill it out, then you are probably ready (all
clock, reset, cell type restrictions, layout instructions, checks on
synthesis and static timing results, etc).
> 2. For our last four 1M+ gate ASICs, we've had our ASIC suppliers do
> all floorplanning, placement, and timing closure. I check out
> the suppliers, set up the projects, and keep track of them as
> they go through the pipe, but the suppliers do all the heavy
> lifting in physical design.
I like keeping basic floorplanning in house and detailed (LVS/DRC clean)
floorplanning at the backend, or at least floorplan specification. There
is too much that can be done to impair timing closure without designer
guidance. That is why I find Chip Architect promising.
> 3. We've got hierarchical flows running on both Avant! and Cadence, both
> with extensive homebrew from the ASIC suppliers. We use DC for
> synthesis and PrimeTime for pre- and post-layout STA, but most of our
> time budgeting has been pretty manual. That's not going to work for
> 3M+ gate ASICs.
Have used mostly DC static timing in the past as it is mostly fast enough
and mode dependancies were not needed. Will use PrimeTime this time,
mostly due to time budgeting aspect which I agree needs to be less manual
(and was extracted from DC static timing plus perl scripts on previous
projects).
> 4. I've been working with the teams to get 3 full chip layouts done,
> but I've been wanting to try to move to a 2-layout model if I can get
> the suppliers flows stabilized. I'd be interested to compare notes
> to see how your 2-layout model works.
I have had pretty good luck with two layout model. The important point is
in not kidding yourself about when you are ready for the first layout,
otherwise you are sure to do three. Checklists can help here. Try the
following:
Layout 1: All gates synthesized.
All memories, hard macs, clocks, & DFT implemented
Sufficient verification to assure that top-level
will not change significantly in final layout.
Timing goals not necessarily closed, but not way off.
Reasonable compile strategy (-map_effort med)
minimum.
All false, multicycle, etc paths worked out.
No timing arcs.
DFT not debugged
Layout Goals:
. Confirm die size
. Structure clocks, top-level busses, and critical
I/O paths
. Identify any architecturally or floorplan
limited critical paths.
. Work the kinks out of the hand-off model (esp.
for timing constraints)
. Work the kinks out of the methodology
. Set timing budgets
. Layout and freeze top-level routing.
. Close timing on one P&R block to iron out
tool flow.
Layout 2: Top level netlist released 1 week in advance of
validation closure.
Subsystems released as validation complete
and signoff checklists completed.
This is basically the layout that goes to mask.
Logic is fully verified.
> What I've been telling my projects is that we could cut out the first
> layout if we could take another 2-4 weeks on the 2nd and 3rd layout OR
> if they could leave more area/timing margin to take care of last minute
> problems. So far I haven't had a project want to take the additional
> time or margin. My feel is that management is willing to pay more
> up-front NRE for an add'l layout in order to minimize the time it takes
> to get from fully verified RTL to mask.
This depends on your market really. For performance oriented designs such
as microprocessors and 3D graphics parts, speed is everyting. For designs
where the clock rate is not the key driving issue, doing an extra layout
actually slows the team down and you are better off adding some additional
margin to your design (order of 5-10% of clock period makes a big
difference).
It also pays to have good predictive wireload models.
>> 1) Project lead determines hierarchical breakdown of chip into
>> "subsystems" which are place and route blocks. Hierarchical
>> equivalence is maintained between RTL and layout. Subsystems have
>> additional requirements from modules, usually WRT reset, interrupt,
>> etc., resynchronization for synchronous distribution across the chip
>> and clock tree components if clock tree synthesis is not being used.
>
> We break out the hierarchy with our suppliers (because they look at the
> physical side while Sun ASIC managers look at the logical). If we were
> in a COT environment, or our design managers had more experience with
> hierarchical layout, I would do what you describe above.
We tend to make this easy to do by wrapping modules in "subsystem wrappers".
These wrappers also include some o fthe things I've listed above as well as
bus interfaces for internal high speed chip buses. We went as far as
standardizing the backside interface to the bus interface so that all
modules would have consistant host interfaces that were not tied to the
current internal bus implementation.
>> 2) At RTL code complete, first pass floorplan is done. Blocks are 90%
>> accurate for cell area and margin is added for late maturing blocks.
>> Manual global routing (Synopsys has new tool optimized for this, but
>> I don't know how good it is) and megacell memory placement is done
>> and timing data extracted for subsystems (PrimeTime or Ambit would
>> be best for this since they have time budgeting). Some global busses
>> may be SPICEd.
>
> This sounds like my Layout 1 (90% complete, margin added for late maturing
> blocks...) We do the top-level route automatically in both our Cadence
> and Avanti flows. We do full block-level floorplanning, cell placement,
> and routing. I haven't tried PrimeTime for budgeting yet, but I'm going
> to try on my next project. We don't spice global busses, but we do spice
> clocks and some sensitive I/O paths (choice of what gets spiced is pretty
> design-specific).
Difference is primarily in level of completeness. Checklists ensure a
higher degree of accuracy in area implementation and most of the remaining
area issues are non-floorplan critical. Some of our internal buses have
been cross-chip high speed internal processor (SOC) buses with long wire
lengths and many clients. We were not comfortable with just A2STAR or
something similar.
>> 3) Representative subsystem is selected for full trial P&R closure to
>> iron out tool flow issues.
>
> Yep. Because it is so hard to identify all architecturally limited paths
> in synthesis, I've been doing all blocks. It also helps us characterize
> insertion delay on un-PLL'ed clocks.
Maybe your wireload models are not so good? I have not really had much
difficulty identifying limited paths in synthesis.
>> 4) Final RTL synthesis is done after validation complete and subsystem
>> areas compared to allow floorplan to be "jiggled" to correct sizes.
>
> One big problem we've been encountering in the multi-million gate ASICs is
> that the typical 20-50% error in estimating gate count has just killed us.
> We've had netlists grow by hundreds of thousands (and once even a million)
> gates from release to release. For this reason, we've started tracking
> gate counts in between layout netlist releases, so that if something
> really starts to grow, we can respond with an early floorplanning netlist
> before the real netlist release.
>
> It probably starts sounding a bit maniacal, but once that netlist is
> handed off to layout, our management starts the clock ticking like its a
> death march. Anything we can do to speed up that netlist-to-closure
> time, we really need to do it. We really emphasize early identification
> of issues for that reason.
Sure. The problem above is because you are doing netlist #1 too early.
It is mostly a wasted effort at that point. We use spreadsheets that list
modules down the left side and module specific data along the top such as
current area, method of determining area, DFT % coverage, number of flops
per domain, and milestones with expected completion dates.
As the project matures, information fills in left to right giving a quick
glance at project progress as well as a wealth of information about the
project data.
>> This pass I expect to integrate Chip Architect to allow both forward
>> annotation of basic floorplan from design team to backend and to help
>> timing closure with synthesis to placed gates.
>
> This makes me think you're in a hybrid ASIC team where you're doing some
> floorplanning and placement to help close your synthesis-to-layout timing.
> I've wanted to explore this methodology, but most of the Sun ASIC teams
> have been very resistant to building any sort of physical infrastructure/
> knowledge at all. They pay for it in NRE, but they seem to prefer that
> to coughing up for the tools/engineers it would take. I think we can only
> get away with it because we're such a big chunk of the ASIC suppliers pie.
> I'd be surprised if we can get away with it forever.
I have not yet met a project manager who did not want to draw up the basic
floorplan yet. I do not mean jiggle every memory into the exact location,
but draw the rough boundaries and cell placements. This would be forward
annotated to the backend team who would make the real floorplan. For
instance your Layout #1 could be done completely in Chip Architect and done
quickly to get subsystem timing budgets. Little time would be wasted in
doing a detailed layout for something that is not yet ready for this. Some
full ASIC houses such as LSI, IBM, VLSI, etc provide the tools and expect
you to do this step anyways.
The real advantage that I hope to get, however, is to follow a synthesis to
placed gates methodology. 99% of timing closure problems are due to gate
placement, not routing. Chip Architect allows running synthesis to
placement and iterating on it in one tool and the forward annotating the
LEF/DEF files to the final router. This should be legal placement at this
point which means timing closure should be quick.
- Tom Ayers
Believe.com
---- ---- ---- ---- ---- ---- ----
From: Nancy Nettleton
To: Tom Ayers
> Yes, I usually use a "layout checklist" which needs to be filled out
> before you are ready. If you can fill it out, then you are probably ready
> (all clock, reset, cell type restrictions, layout instructions, checks on
> synthesis and static timing results, etc).
Hi, Tom,
Most of our designers want to go to layout before they could complete
such a checklist. One of my biggest pushes in the past year has been
to get the projects embedding checklist items in their project plans so
we aren't scrambling at the last minute trying to put in place something
we need but neglected.
> I like keeping basic floorplanning in house and detailed (LVS/DRC clean)
> floorplanning at the backend, or at least floorplan specification. There
> is too much that can be done to impair timing closure without designer
> guidance. That is why I find Chip Architect promising.
We've been providing designer guidance to floorplanning via an extensive
design review process as well as face2face, @the-workstation sessions
to tune it.
I've been finding that even with alot of designer interaction, there are
data flow and timing detail we miss until we get timing constraints,
particularly wrt DFT. Too many whacked out half-cycle paths and things
that are just hard to identify and handle without detailed specs.
I'm getting ready to try a hybrid kind of hierarchical flow that may give
me the best of both worlds (famous last words :-)
We used to do this back on UltraSparc II, but until we had good hierarchy,
there was no sense trying it on ASICs.
We're going to do one netlist release to stabilize the top-level. The
block-level netlists will need to have all memories instantiated with
most of their gates and good estimates of how many gates are still
coming. We'll stabilize the top-level and block-level floorplans, then
set the block-level layout flows up so that they can be executed as a
back-end to the synthesis flow so that my logic designers can check
physical timing for a synthesis run (instead of just checking synthesis
timing with WLM's). The trick will be stabilizing the floorplans enough
that the blocks can spin inside that floorplan for a while. I expect
we'll want to spin the block layout down through placement and rough
layout timing closure, but not through routing. This way, designers
can see their physical timing without going through all the work of
releasing a netlist to physical design.
The only things I haven't worked out yet are how often do we need to
integrate the whole chip to make sure it'll work (guessing twice), who
writes/maintains the rough PD flows (us or our supplier), and does the
rough PD flow produce just timing or something useful to layout as well.
I don't have a chip to try this on yet. It could be a few months before
I can really work the kinks out of this.
> This depends on your market really. For performance oriented designs
> such as microprocessors and 3D graphics parts, speed is everyting. For
> designs where the clock rate is not the key driving issue, doing an extra
> layout actually slows the team down and you are better off adding some
> additional margin to your design (order of 5-10% of clock period makes a
> big difference).
>
> It also pays to have good predictive wireload models.
In alot of ways I agree with you, but when I think back over the last few
projects I worked on, we could not have done it without that first layout.
Of the last 5 ASICs I've worked on, 2 had the wrong physical design strategy
going in. This was due to alot of things like immature architectures and
inexperience with the process technology. If we hadn't done 1 trial
layout to find out we had the wrong strategy, we would've spent all our time
in the final trial layout focusing on methodology and not on the design.
On the last 3 ASICs, we did not have the wrong physical strategy going in.
I could possibly have lived without 2 trial layouts, but it would've taken
longer because the physical team had never owned layout timing closure and
never done hierarchy.
Its possible Sun needs 3 layout spins because our business people insist
on contracting high performance, high complexity silicon with suppliers
who really don't do high performance, high complexity. I hadn't thought
about that alot.
> - What is your cross-capacitance issue?
I worked with our microprocessor design group to review their 0.18 um
simulation results, and it was a big wake-up call. When we started
simulating our ASIC 0.18 um technologies/libraries, we found that with
as little as 1 mm of adjacent routing, we could see >50% increase in
delay. We also found that with about 1 mm of adjacent routing, we could
get noise glitches better than .8V. So we started our 0.18 um suppliers
working on xcap noise/delay check & repair. What an odyssey...
We've got flows in place now that are finding and fixing coupling
topologies that would've caused us to violate set-up or clock in an
incorrect logic value. All of our xcap checking is static, so it all
assumes worst-case phase relationships.
The only silicon data I have on xcap is test chip data. I haven't had
a chip fail on xcap yet. I've heard of a couple in 0.25um, but we
skipped 0.25um for the most part.
> Maybe your wireload models are not so good? I have not really had much
> difficulty identifying limited paths in synthesis.
Our largest hierarchical ASIC was partitioned into 19 sub-modules, so it's
way over-partitioned. Consequently, there are many long, skinny blocks for
which WLM's just don't work very well.
I also think we had alot of young engineers really hoping against hope that
marginal paths would make it. It was a real struggle to institute a logic
level budget of 30 levels of logic for 200MHz design. The custom wireload
models helped, but it was still a struggle getting some of the paths down.
I think it was acerbated by late changes to the architecture, but I really
don't see that end of things.
> We use spreadsheets that list modules down the left side and module
> specific data along the top such as current area, method of determining
> area, DFT % coverage, number of flops per domain, and milestones with
> expected completion dates. As the project matures, information fills in
> left to right giving a quick glance at project progress as well as a
> wealth of information about the project data.
I agree that the spreadsheet data is absolutely necessary, but I'm still
not convinced we could live without netlist #1 on all designs. Without a
stable methodology, the risk is just too great that we wouldn't be able
to freeze the top-level before final layout. I can't tell you how many
hours I've spent in meetings with managers trying to scrub 3 days out of
the final layout schedule.
> I have not yet met a project manager who did not want to draw up the basic
> floorplan yet. I do not mean jiggle every memory into the exact location,
> but draw the rough boundaries and cell placements. This would be forward
> annotated to the backend team who would make the real floorplan. For
> instance your Layout #1 could be done completely in Chip Architect and
> done quickly to get subsystem timing budgets. Little time would be wasted
> in doing a detailed layout for something that is not yet ready for this.
> Some full ASIC houses such as LSI, IBM, VLSI, etc provide the tools and
> expect you to do this step anyways.
I, also, have not met a design manager who didn't want to draw the basic
boundaries and module placements.
Don't know about Chip Architect. Last time I looked at it, it couldn't
structure real clocks, which is a must-have out of the first integration
for us.
Agreed on ASIC house expectations. Like I said, I think sun often gets
a different ride on the same merri-go-round.
> The real advantage that I hope to get, however, is to follow a synthesis
> to placed gates methodology. 99% of timing closure problems are due to
> gate placement, not routing.
Couldn't agree more. That's what I'm hoping to get out of my hybrid flow
I outlined above.
The only gotch here is if cross-capacitance increases dramatically in
0.15um technologies. I really hope the guys working on xcap avoidance
are coming up with something good, because the processor data I'm
looking at is not very encouraging. As their clock rates approached
500MHz, their cross-cap shot up because of increased slew rate. They
were finding 10-100x more xcap violations than I'm getting for the same
metallization scheme. That kind of increase in xcap violations could
mean we'll have to go to routing to know our timing. Man that would
suck. I've really got my fingers crossed for xcap avoidance.
> Chip Architect allows running synthesis to placement and iterating on it
> in one tool and the forward annotating the LEF/DEF files to the final
> router. This should be legal placement at this point which means timing
> closure should be quick.
That might be really slick. I'll be interested to see how it works out.
- Nancy Nettleton
Sun Microsystems Silicon Valley, CA
|
|