( ESNUG 483 Item 9 ) -------------------------------------------- [11/19/09]
From: Shrikrishna Mehetre <shrikrishna.mehetre=usr comp=open-silicon . calm>
Subject: We saved 12 to 14 man-weeks using Magma Hydra for floorplanning
Hi, John,
I know you like customer reviews so I thought I'd share my experience using
the Magma Hydra floorplanner. We've been using it since Sept 2008, and
selected it primarily based on:
1. Hydra's Repeated Block capability - it can handle multiple sets of
repeated blocks throughout the floorplanning flow.
2. Hydra's timing budgeting accuracy - it's based on physical information
rather than rules.
3. Hydra's available choices of abstraction models - black box, glass
box, or white box - over the different stages of the flow to get
quick top-level timing closure on our design. (Our benchmark details
are below.)
4. Magma's multi-corner, multi-mode (MCMM) support.
We used Hydra 1.0.85 on a 28 M gate Chartered 90 nm mixed-signal flip chip
design. It has 13 sub-chips spread over 84 soft macro instances, 7 M
placeable instances, 5 DDR3-800 Mhz interfaces, 3 PLLs and 9 functional
clock domains (plus 100's of generated clocks) and 14 functional modes.
The ASIC works at maximum core frequency of 300 MHz at chip level. At the
system level, multiple instances of the chip communicates with other
devices in different modes in a parallel and pipelined fashion for
throughput in the Gbps range. We used Talus Vortex 1.0.85 for P&R.
Our chip had 5 unique blocks repeated 76 times. This high number of repeat
blocks plus its critical logic had a major impact on our design budgeting
and physical hierarchy decisions. The repeated blocks were present along
all 4 sides of the chip and they had orientations corresponding to
horizontal & vertical flipping. The pads were included inside some of
the sub-chips, so the block placement and pin assignments were very complex
and needed manual intervention.
Our die was severely pad-limited. We chose a dual pad ring for meeting our
targeted die-size. The die targeted a periphery-I/O flip chip package.
This made our power plan challenging -- we had two different patterns of
power and ground mesh distribution to be followed in the core and in the
periphery due to package restrictions.
Hydra's Timing Budgeting:
Blackboxes are a light weight representation of a block, containing boundary
timing constraints and physical boundary info. Since they don't actually
contain any functional information they require very little memory allowing
the creation of timing budgets of our 28 M gate chip on a 32 GB machine.
On the other hand, Glassboxes are timing and physical abstractions of a
block, containing enough logical and physical information to allow for
top-level timing analysis. Compared to a Blackbox, a Glassbox contains more
information about the actual implementation and therefore is more accurate
but will require more memory. With Glassboxes the timing budgeting on our
28 M gate design needed a 64 GB computing machine.
Below are some stats on our design comparing QoR and memory requirements
using Blackbox and Glassbox during intermediate stages of the design cycle.
Top Level Timing Analysis (negative slack in nsec)
Blackbox GlassBox
WNS TNS FEP WNS TNS FEP
Fix cell - .880 -6287 6606 -1.184 -8754 8647
Fix clock -1.286 -6027 4620 -1.423 -8843 7643
Fix wire -1.450 -7844 5227 -1.976 -10632 8631
FEP is the failing end points.
Memory Utilization
BlackBox GlassBox
Runtime(hrs) Memory Runtime(hrs) Memory
wall CPU (GB) wall CPU (GB)
time time time time
Fix cell 10.68 17.68 25.84 20.70 28.89 39.16
Fix clock 4.29 5.98 25.84 8.74 10.83 42.63
Fix wire 18.56 34.36 30.62 35.78 58.60 54.69
We did our timing budgeting with an ideal mode clock since our clock tree
had not yet been defined. We did pin-optimizations in multiple iterations
starting with the flyline-based pin assignment to the blocks, followed by
the global routing-based incremental pin assignment for feed through pins
as required. The pin assignment was able to handle repeated blocks
providing a unified set for the master and clone sets.
When we changed our block constraints, Hydra let us re-budget (pulled up
constraints) to change top level constraints on the fly. We set repeated
block constraints for the master and cloned blocks to give unified
constraints for each unique block.
We executed a global routing pass with buffer planning for high fan-out
nets and congestion-less top-level global routing to optimize our delay
budget for the blocks.
We had several crossbar switches in our design which were located centrally
and they had to talk to almost every other block -- a feature that greatly
constrained our top level budget. We used Hydra to identify problems such
as paths that needed architectural changes like cloning/pipelining to meet
timing. We exported the budgeted blocks as View/Volcano format for
implementation in Vortex.
Multi-Corner, Multi-Mode support (MCMM):
Our design had 14 functional modes with overlapping clocks and I/O delays.
Plus there were test modes: scan_shift, scancap_atspeed, scancap_stuckat,
bist and boundary scan. We merged the functional modes to a superset
func_dom mode. With the merged mode and the other 5 test modes, we had 6
total modes which needed to be enabled during the design implementation.
Hydra's Gotchas:
- We would like to see better reporting metrics out of budgeting.
When Hydra generates block-level budgets, it's just a number,
like an SDC file. It doesn't tell you if constraints are met
or not. You're forced to use Talus Vortex to get the analysis.
- Hydra likes to write out too many constraints; they're really
hard to follow. For example, modeling OCV is a mess; it
requires a large number of SDC constraints to model correctly.
- Hydra's MMMC budgeting was still in the works when we did this
design so we had to use the workaround flow below. We need this
for the next ASICs, and Magma claims it will be available in
their Q2 Hydra release.
We developed an MM budgeting workaround - it was mixed methodology
which combined Magma's MMMC fix-budget constraint generation with
their budgeting flow. We ran Hydra's fix-budget flow for mutually
exclusive modes like bist and scan, where as for correlated modes
like func1 and func2, we only ran low effort fix budgets such as
steps to push down and create constraint. We created 4 different
sets of block constraints and combined them into one constraint
file for Magma's MMMC block implement. We then pulled up these
constraints to close top level timing, and used them for both our
BlackBox and GlassBox runs.
- When we switched between the abstractions (from GlassBox to Blackbox
and vice versa) during budgeting and implementation, some of the
timing constraints and scan chain information across the hierarchies
was getting lost. For timing constraints we had to source separate
set of constraints while switching the abstractions. To retain scan
information, we had to trace the scan chains every time after
switching of the abstractions.
Hydra's Upside:
Hydra automated a lot of functionality we used to do mostly manually such as
logical partitioning, budgeting and pin assignments, channel based shaping,
early congestion analysis, and retention of useful floorplan changes over
iterations.
- During partitioning and shaping, Hydra grouped the logical partitioning
into appropriately sized physical partitions, taking into consideration
the impact on the physical flow, (e.g. the boundary pins, and the
connectivity between the logical modules), then used that information
to determine the shapes and positions of physical partitions. Setting
the shape and locations of the physical partitions is normally a very
time consuming manual task which could take around 10-12 man-weeks for
our 7 million cell instances design. In contrast, each run with Hydra's
auto partitioner and shaper took about 7 hours. As a result, we were
able to produce our final floorplan in about 6 man-weeks of effort.
- Repeated block pin assignment: We had 76 repeated blocks, and doing pin
assignments manually could have taken us 4 man-weeks. Repeated block
pin assignment are more difficult because the blocks sit on different
parts of the chip, but must each have the same pin assignments. Magma's
repeated block pin assignment ability optimized for reduced congestion,
so it only took us 2 weeks with Hydra with little manual intervention.
- Hydra's congestion analysis gives multiple options for resizing our
channels to manage area and congestion trade-offs.
- Top down clock tree planning lets us determine clock constraints at
each of the block boundaries, including on-chip-variation constraints.
- Feed through creation lets us determines what feed throughs go through
what blocks, and it creates the pins and the feed through connections
on the overlay/logical wrapper through the block.
- With scan-chain optimization we can avoid criss-cross pattern routing.
Because Hydra automated the tasks that we normally would have done by hand,
I estimate that we were able to save approximately 12 to 14 man-weeks of
effort on our overall design closure; not bad for a 28 M gate mixed-signal
flip chip design with 84 blocks.
I would like to thank Raju Rakha of Open-Silicon for this design execution
and his help in gathering the critical data used in this article.
- Shrikrishna Nana Mehetre
Open-Silicon Research Bangalore, India
Join
Index
Next->Item
|
|