( ESNUG 354 Item 10 ) --------------------------------------------- [6/1/00]
Subject: ( ESNUG 353 #4 ) DC Synthesizing To Fewer, Bigger, Richer Cells?
> I was wondering how I could use that 23 percent white space for a few
> bigger, richer cells (e.g. AND-OR-INV) and reduce the wiring overhead. My
> idea is while the cell area increases, the routing overhead decreases,
> and the overall design area should shrink (wishful thinking?)... We tried
> editing the .lib file to increase the wire area in wireload model. The
> idea being that each wire now should have an "area penalty" associated.
>
> wire_load ("my_wire_load") {
> resistance : 0.000083 ;
> capacitance : 0.000116858 ;
> area : was: 1.319252 is: 5.319252; <== but didn't help a lot
> (0.3% less nets)
> slope : 101.342461 ;
> fanout_length (1, 149.84) ;
> ... more entries here ...
> }
>
> To me it seems that this number is primarily used for the report_area
> output but for nothing else. Correct?
>
> We were also thinking about making nets artificially longer, so that each
> wire has a timing-penalty associated (I have no synth results yet) but I'm
> somehow not convinced that that's a good strategy. Any suggestions?
>
> - Christian Bohm
> Analog Devices B.V. Somewhere, Europe
From: [ A European Synopsys FAE ]
Hi John,
Yes, wire area attribute is useful to some extend in reducing net length and
using more complex gates, yet...
DC always had a limitation when it comes to using complex gates. Outside of
DesignWare and sequential cells we hardly ever map to cells with multiple
outputs, and the number of used inputs has a maximum of (have to check) 4.
Using DC Ultra however, you can use more complex cells, and wider fanin
cells, but be aware of the fact that occasionally wide fanin cells might
lead to congestion problems.
Concerning DC Ultra, I believe we have an upgrade plan from DC Expert
available.
- [ A European Synopsys FAE ]
---- ---- ---- ---- ---- ---- ----
From: Stephen McInerney
Hi John,
Has Christian tried the prefer_cell or set_prefer directive? He could also
forbid the smaller cells (at appropriate points in synthesis.) I used this
many years back with reasonable results to guide DC.
Another hack would be to set the areas of complex cells in the library to
lower their synthesis cost.
There are many manipulations that you can to do the wireload, setting an
over-optimistic wireload, customizing it, etc., but they tend to cause huge
violations when you check with the target wireload.
One final question: is he correctly using hierarchy in the design? Has he
tried different synthesis scripts and constraints for each (separate FSMs
and control logic from regular stuff.) Are his test insertion, I/O delays
or clock tree inserting spurious cells?
- Stephen McInerney
---- ---- ---- ---- ---- ---- ----
From: Kevin Grotjohn
Hi John,
I figure I should contribute results of my R&D on this topic. Hopefully it
is free of spelling/grammer errors -- any mistakes are the fault of cold
medicine and bad typing!
The WLM area factor does work if you set it properly, but this is not easy
to do because it is a unitless factor that depends on your ASIC vendor
library area units, EDA vendor WLM length units, and your ASIC/EDA vendor
needs to be in sync! I assume you do not have that problem. ;-)
WLM area factor should be calculated such that dc_shell cell_area/total_area
is a good prediction of layout utilization. Thus if it is a highly utilized
design then it is OK to use smaller/faster cells, while for low utilization
the larger/smaller cells should be tried.
Dc_shell multiplies WLM area factor by total fanout_length for the design
and WLM, which usually is monotonically increasing per fanout. Thus the
higher the total fanout in your design, the higher the wire_area of your
design. Since wire_area is included as part of the area cost function
(assuming you did set_max_area 0) it will tend to map to cells that
eliminate nets thus reducing total fanout.
Even though these complex cells eliminate wires in layout, they are
usually slower, so your timing constraints may interfere with this
optimization - in which case you can ignore this letter...
First find out how Synopsys FloorPlan Manager calculates it. This is
something many design/floorplanners miscalculate, and like the original
poster they have not seen the benefit.
dc_shell> man create_wire_model
Net area coefficients are calculated whenever the
-total_area option is used. The cell area of the top
design or cluster specified is subtracted from the
total area, area, to determine the total net area
(wire_area). Then wire_area value is used to calculate
the area coefficient (wire_area / total_wire_length).
If the -hierarchy option has been specified, the area
coefficient calculated for the top design or cluster is
used for all wire load models of subdesigns or
clusters.
Here is a breakdown of the formula for LSI.
Cell area is calculated in cellunits for LSI libraries. This simply
measures the number of grids the cell occupies in the placement row
direction. If it is a double cell that takes two rows then count it twice.
Total area must also then be calculated in cellunits. Assuming that
placement rows have no spacing between them, this is simply the number of
placements rows times the number of grids wide for the cell placement area.
Even if the designs placement rows have added route spacing, no spacing is
assumed because area utilization rather than row utilization needs to be
calculated. If you have a floorplan with memories/cores be sure to subtract
out those areas using an equivalent measure.
"total_wire_length" must be in the same units that the wire load model uses.
I like LSI WLM's to use Kgrids (1000 cellunits), but getting access to
total_wire_length and its units depends on your tool suite. If you are
stuck then summarize the backannotation set_load file and divide by the WLM
capacitance factor to estimate the total_wire_length.
So for LSI the formula is
(total_area_in_cellunits - cell_area_in_cellunits)
WLM area = ------------------------------------------------
(total_wire_length_in_cellunits/1000)
How can you tell it worked?
The painful way is to do an iterative squeeze on place and route until route
failure, for netlists synthesized with different WLM area factors. This is
especially painful if you are not sure you have the WLM area factor set
right. Realize also that this is a recursive problem, because the layout
used to calculate WLM area factor will depend on the initial WLM area
factor.
Another way is to score the library/design for those cells that eliminate
nets to gage the potential for layout improvement. It is really not
necessary to score MUX, flop, or arithmetic cells because those are not
synthesized with random logic algorithms in dc_shell and there is not much
chance to minimize this cost. Hopefully your ASIC vendor has include
schematics of cell structure in the databook so you can calculate this cost.
Here's how total nets is influenced:
a) number_of_inputs: If high input cells do not exist, then more low
input cells would get used, which would increase total nets
b) logical_depth: If complex "and/or" cells are used - then this will
eliminate chains of "and/or" logic in the netlist, which minimizes
total nets.
Cell prefix logical_depth
A O 1
AO OA 2
AOA OAO 3
AOAO OAOA 4
AOAOA OAOAO 5
AOAOAO OAOAOA 6
c) number_of_inversions - All ASIC libraries are composed of negative
logic primitives, so count the number of input/output inverters in the
cell. If instead dc_shell used inverters in the netlist, then total
nets would be increased.
Cell number_of_inversions
ND2 0 basic NAND gate
AND2 1 ND2 w/ N1 on output
ND2AN 1 ND2 w/ N1 on input A
AND2AN 2 ND2 w/ N1 on output & input A
AND2ANBN 3 ND2 w/ N1 on output & input A, B
NR2 0 basic NOR gate
Note that even though the AND2ANBN performs the same function as NR2, it
counts more as a net minimizer. If the NR2 is poor for drive staging,
dc_shell will use NAND gates and inverters if the AND2ANBN is not available
to get better drive staging.
To evaluate the mapping of the design use a multidimensional histogram to
see the WLM area shift from cells with a lower score to a higher score, that
results in a reduced net count. You really can not simply summarize the
score for a design because the tradeoff is between more cells with a lower
score, and fewer cells with a higher score. The result can also be seen
with less net/instance count, but that does not make for pretty charts for
clueless managers. Make sure you do not report cell area - as it may
actually go up with complex cells which is OK if you have better layout
utilization.
So there you go John, this method is a lot easier than tweaking the WLM area
with layout squeeze iterations. Tweak the WLM area so it predicts the
layout utilization, and measure the reduced fanout (net) cost of the complex
cells. However what you really need is a physical synthesis tool that does
not use WLM area for prediction but optimizes local congestion vs. timing
vs. area when mapping cells - but that is another topic.
- Kevin Grotjohn
LSI Logic Corp. Pleasanton, CA
|
|