Editor's Note: Remember that Vito-Goes-To-Jail photo drama? Well, it
appears that our friends at Synopsys thought they'd get in on the fun.
Click on http://www.DeepChip.com and go to the part marked "Vito's
Revenge" to see what I'm talking about...
- John Cooley
the ESNUG guy
( ESNUG 338 Subjects ) ------------------------------------------- [12/3/99]
Item 1: A Customer Trys The New Chip Architect Tool On 3 LSI Logic ASICs
Item 2: Stupid Gotcha Found While Installing Simucad's Ver 99.1 Silos III
Item 3: ( ESNUG 335 #9 ) The Verilog++ Discussion & Co-design's Superlog
Item 4: ( ESNUG 337 #1 ) ICCAD'99 Gurus Say "Flat P&R Is Where It's At!"
Item 5: ( ESNUG 337 #1 ) Customer Q&A On The 1999 Dataquest ASIC Survey
Item 6: ( ESNUG 335 #2 ) 3-D Load-Dependent Lookup Tables In DC/PT 99.10
Item 7: ( ESNUG 337 #10 ) Yes!, I Found A SpeedChart Y2K FSPD Daemon!
Item 8: ( ESNUG 337 #3 ) X's, Optimism, Pessimism, Resets, & Verilog Case
Item 9: ( ESNUG 335 #9 ) OK, C/C++ HW Design Won't Work; Why Not SLDL?
The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com
( ESNUG 338 Item 1 ) --------------------------------------------- [12/3/99]
From: Jon Stahl <jstahl@avici.com>
Subject: A Customer Trys The New Chip Architect Tool On 3 LSI Logic ASICs
Hi John,
I was very interested to read the Flexroute and lately PhysOpt reviews. A
few months ago, I and others at my company didn't really have a lot of
interest in physical design tools from Synopsys. However after a decision
to move to third party layout tools, we decided to explore all the options
out there -- not just the standard Cadence and Avanti offerings. And after
an eval of Chip Architect, we have been pleasantly surprised and have become
early adopters. I thought there might be interest in hearing a user review.
All of our designs use LSI Logic as the foundry.
We decided to look at Chip Architect because of certain key features. Like
a lot of other folks these days, we also believe that hierarchical layoutis
the way to go. Chip Architect promised a natural way to do this, starting
not only at the gate level, but the ability to do planning at the black box
and RTL level, too. The tight integration between placement and timing that
Chip Architect promised really got our attention.
The Chip Architect Design Flow
------------------------------
Before I critique Chip Architect itself, I felt it would be best to give a
thumbnail schetch of how its flow works. It's a challenging task to comeup
with recipe book style steps for the Chip Architect flow, but here is my
attempt at a general flow. I think this is OK as a first level guideline.
1. Create Quick Timing Models (QTMs) for any soft black box blocks
(for which RTL is not available), if any, in PrimeTime. We designed
in Verilog and used VCS for simulation. We used an old copy of
VeriLint and VERA to verify the Verilog RTL.
2. Read hierarchical structural netlists into Chip Architect. Could bea
combination of black box blocks, hard-macros, RTL blocks, Gate level
blocks. For memories, we used LSImemory to generate synlibs. We
read the synlibs into DC, did an update_lib, and wrote out mem.db's
for each mem macro we used. Imported them into Chip Architect.
3. Read top-level constraints (say top_const.tcl), timing models of
IP/hard-macros, and QTMs in Chip Architect.
4. Floorplan Hierarchically in Chip Architect:
a) Size black boxes. Here is a TCL script to size a black box,
using approximate gate count:
# Size a selected black box, based on gate count.
# Assuming 30u sq area per gate
proc Size_BB {size} {
reshape_object -def "0 0 [expr sqrt($size*30)] [expr sqrt($size*30)]"
[get_sel]
}
b) Manipulate physical hierarchy (flatten, merge, etc.). Hierarchy
browser window in Chip Architect is great tool for this.
c) Perform Automatic block placement in Chip Architect
d) Do Power Bus planning, Create blockages, Pin assignment. We did
all of our power designing in LSI's layout editor because it was
a 4 metal layer design. Usually you only worry about power
conflicts if your power layer is on the same layer as your cell
interconnect layer. I kept our power stuff in metal 3 & metal 4.
e) Coarse Routing in Chip Architect
f) Perform Std cell placements within blocks, and coarse route
5. Analyze for Timing and Congestion in Chip Architect. Use Chip
Architect's built-in PrimeTime engine for Timing Analysis. Use
congestion map utility within Chip Architect for congestion.
6. Tweak as necessary. Some alternatives (depending on the violations)
are:
a) Refine floorplan in Chip Architect (resize, move blocks, add
blockages and so on)
b) Re-run pin assignment, placement, coarse routing with higher effort
options.
c) Perform top-level route in FlexRoute.
d) Perform In-Place Optimization (gate sizing, buffer insertion) in
Chip Architect. (I couldn't get this to work!)
7. Output custom wire load models (say Chip.cwl), loads (say
Chip_setload.tcl and Chip_setresistance.tcl), and interconnect SDF
(say Chip.sdf) to do budgeting to generate accurate synthesis budgets.
8. Perform Budgeting in PrimeTime -- Create synthesis constraints. Here
is a sample script to do budgeting in PrimeTime:
# Read netlist
read_verilog Chip_est.v
current_design Chip
# Read SDF
read_sdf ../parasitics/Chip.sdf
# Apply constraints
source Chip_setload.tcl
source Chip_setresistance.tcl
source top_const.tcl
# Allocate budget for each of the top level blocks.
allocate_budgets -level 0 -write_context -no_boundary -format dcsh
9. Run Design Compiler on the soft blocks using constraints generated from
budgeting (step 8), and Customer wire load model generated from Chip
Architect, i.e. Chip.cwl (step 7). (My understanding is that for
better results, you could use PhysOpt at this stage in place of DC.)
10. Read the DC generated netlist back into Chip Architect.
11. Perform final placement on all blocks (high effort), refine Floorplan,
do top level route. Run clock tree synthesis (currently under dev in
Chip Architect). We're looking at integrating Ultima's ClockWise tool
here because they have a useful skew solution. The Chip Architect
people are doing a zero skew tool; ClockWise can skew your design to
get additional set-up time.
12. Final Analysis and Optimization (similar to step 5). The only gotcha
here is that I had to write my own TCL repeater insertion tool because
I couldn't Chip Architect's IPO features to work.
13. Perform final In-Place Optimization (IPO) & other fixes for violations
(like step 6). (Uh... This was the official way it was supposed to
work. It didn't. I'm just including this step to be complete.)
14. Output final floorplan, final cell placement, final netlist to a std
cell router -- in our case, this LSI's FlexStream Global, Detail, and
Cleanup tools (version 1.0). (This isn't new LSI software, it's just
been renamed to FlexStream. Before this, it was LSI PD, before that
it was CMDE. This is tried & true LSI software.)
15. Bring back the routed blocks into Chip Architect, make sure overall
chip timing is okay. We never did this step, because we used LSI's
delay predictor, "LSIdelay", to generate SDF's and then we used
PrimeTime to verify the final timing. Actually, we used Frequency
Tech's Columbus to extract the parasitics and fed that into LSIdelay
to make the SDF's.
Our major concerns with a new tool of this complexity and from a vendor whom
until recently didn't play in the P&R field were typical:
- Was the code stable ?
- Would it have the capacity to handle million gate plus designs ?
- Would the placement quality of results measure up ?
- Would the runtime measure up (multi-threaded ?) ?
Plus the additional concern of whether the timing Chip Architect predicted
would match up, within reasonable amount, with our vendor's (LSI Logic)
sign-off delay calculator.
Our Experience With 3 LSI Designs
---------------------------------
We did various amounts of testing/actual work w/ the tool on three designs,
"Larry", "Moe", and "Curly":
1.) "Larry" - 100K gates, 3 SRAMs, 100MHz (used to test our proposed flow)
Since Chip Architect does not perform detail routing, but stops at
placement and global routing, we had a problem. To use it on our
current 0.25um and larger geometry designs we would have to interface
to the LSI proprietary tools for clock insertion, routing, etc. (LSI
now uses Avanti for the 0.18um and below geometries). Chip Architect
was designed to interface to Cadence/Avanti, but had no hooks for LSI.
Using Chip Architect's TCL API interface, we were able to accomplish
the two way handoff without much difficulty. We made a Perl scriptto
map LSI's pad placement into TCL commands to re-create it in Chip
Architect. And a TCL script was used to write out the cell
coordinates and orientations of all internal cells in LSI format.
On this design, due to the small size, we decided not to use the
hierarchical features and just place the it flat. This design did
not have difficult core timing, but had extremely tight I/O timing.
As a way to meet both setup and hold output constraints this design
had large delay cells instantiated in the output paths which we would
later ECO downsize as necessary.
Performing a timing driven placement was simple as we could pretty
much use the PrimeTime constraints already prepared for timing
analysis. On this design placement took 1.2hrs. (4 processors),
global routing 45 min., and timing calculation and analysis another
30 min. Our results were mostly good, as core timing was completely
met with +1.3ns of slack, but the I/O timing was off (as expected)
with -3.3ns of slack.
We then attempted to use the automatic IPO features of Chip Architect
to fix the timing problems -- with little luck. The promised Chip
Architect IPO features were cell upsizing and buffer insertion, but
the recommended flow for "best" results was to export Chip Architect
info to Synopsys Floorplan Manager (... more on this later).
Anyway, since the necessary corrections were obvious, it was easy
to use the (excellent) interactive Chip Architect TCL commands to
downsize the cells, legalize the placement, and re-time the design
(*incremental* in most cases, and very fast) ... and timing was met
in Chip Architect.
We then dumped the placement into LSI's tools, re-performed global
routing and estimated parasitic extraction, and generated SDF's using
LSI's delay calculator. After re-timing the design with PrimeTime,
the correlation was within ~5% on *most* paths. (The only real
exception to the outstanding correlation was output buffer timing.
For some reason, which at this point we haven't really researched or
explained, LSI's delay predictor shows ~900 ps slower paths than Chip
Architect. This anomaly was put on the back burner due to time
pressures and the simple work around of compensating with additional
output delay.)
2.) "Moe" - 1M gates, 100 SRAMs @ 100MHz
This was a design in progress. We were having trouble getting timing
closure. After well over a month of timing iterations -- which forus
consists of synthesis, test insertion, placement, scan reordering,
MOTIVE analysis, repeater insertion and IPO upsizing, and re-analysis
with motive -- we still had ~3K paths with as much as -2 ns of slack.
Since the design was being done flat with just careful floorplanning,
and it would have been way too much work to go back and re-implement
the design hierarchically, it was a what the hell let's try it kind
of thing to throw Chip Architect totally flat (no floorplan) at it.
It took a little bit of work to port the ram placement, power routes,
and placeblocks from the LSI database and into Chip Architect, but
after that things ran smoothly. Chip Architect completed full timing
driven placement of ~300K instances in 6 hrs (8 processors) ... with
results better than all of our careful floorplanning and timing driven
placement in LSI's tools, including post placement repeater insertion
and IPO's. End result: 250 failing paths w/ -1ns of slack or better.
The only bad things to say about these surprising results is that Chip
Architect IPO attempts to try and fix the remaining failures would
only repeatedly crash. And we haven't had the chance to port the
placement back into the LSI tools, re-calculate timing, and re-analyze
the results to make sure the timing really is this good (although we
did on "Curly" - see below).
3.) "Curly" - 750K gates and 50 SRAMs @ 100/155MHz
Here was a design just beginning in the planning and layout stages
where we could really use the capabilities of Chip Architect. It
consisted of ~20 large sub-systems, making it ideal for hierarchical
layout. Furthermore (without going into too much detail), at this
point we made an observation that got rid of one of the real pains
normally associated with hierarchical layout -- developing the lower
level timing constraints.
Our synthesis methodology consists of bottom-up compile, PrimeTime
budgeting, and re-compilation. With this in mind, we noticed that the
block level budgeted scripts that Primetime outputs, of which we had
until this point used only the dcsh format for re-compilation, are
*almost* perfect for direct use (the ptsh format) as block level
placement constraints in Chip Architect. Only a little filtering was
needed to remove some unnecessary stuff.
So following this idea, we set up a top level floorplan, allocated
outlines for the lower blocks, and then kicked off runs that just
sourced the filtered ptsh scripts for constraints. We used GNU Make
and multiple Chip Architect licenses to run concurrently. And
runtimes for block level placement, global route, parasitic
extraction, and static timing were incredibly fast: 4 to 32 minutes
per block for blocks that ranged in size up to 90K gates.
The output of all this was placed/global-routed timing reports which
we compared side by side with the same Design Compiler reports.
And the results at the block level were very good: 14 met timing, and
6 with -3 ns or better of negative slack violations on the budgeted
I/O constraints. Then, using Chip Architect to global route the
inter-block nets, we generated a top level timing report ... and had
up to -11 ns of slack.
Analysis of the failing paths immediately showed the problem, long
inter-block wires. So again we tried the Chip Architect IPO features,
first at the block level, and then at the top. And again we had no
luck. Block level runs produced weird and inconsistant results,
sometimes ending up with worse than they started (?). Top level
attempts would only produce crashes.
So, rolling up our sleeves (we were commited to Chip Architect now,
we crafted a TCL script to (using the API) add repeaters along
the long wires ... and a week later had code that would produce a
design with only -0.75 ns of slack. Furthermore, since we had
overconstrained the synthesis and placement by 0.3ns, and had 0.8 ns
of clock uncertainty for skew and PLL jitter figured in, we now had
a placement which we felt was good enough to go into route with.
In fact, some poking around showed that the failing paths probably
could have been fixed pretty easily with some interactive upsizing...
but this was a trial with RTL that wasn't frozen, so we moved on.
With some trepidation we took the placement into the LSI tools and
re-analyzed, and came up w/ -0.25 ns of slack (different path). This
we considered to be excellent correlation when we remembered that
a) two different global routers
b) two different parasitic extractors and
c) two different delay calculators were used.
If you have read this far, you have gathered that we ran into some problems
with the Chip Architect. The worst thing is that its IPO features seem to
be pretty much useless in their current incarnation. The only good thingI
can say about them is that Synopsys seems to be aware of the situation and
has promised improvements in the next major release.
However, other than broken IPO, the few bugs we ran into have been minor and
had easy workarounds. I've been very impressed w/ the code stability, only
getting it to roll over & die when I really, really pushed it.
What has been a little disappointing, if not expected, has not been bugs but
miscellaneous annoying behavior. The worst example of this is an extremely
awkward logical vs. physical hierarchy separation. To place a design in
Chip Architect there cannot be any hierarchy, so if there was a logical
hierarchy it must must be flattened. Flattening our big 300K instances
design took ~10 hrs, which if added to the placement duration makes the
runtime go from excellent to very poor. In addition, if you were to make
netlist changes to the design, there is no way to write out a logical
netlist, so you end up with a cumbersome flat netlist -- which I am still
testing to see if it breaks other tools. Also, if you happen to need to
apply attributes/constraints/etc. to internal design nets/pins/cells, you
must maintain two sets: one for the hierarchical design & one for the flat.
Finally, the most disappointing thing to me about the tool to me is that it
appears Synopsys might not *let* it be as good as it *could* be. They have
consciously only integrated about 70% of Primetime into Chip Architect,
leaving out budgeting and misc. other features. It appears as if they even
intentionally hamstrung some commands. In addition, one of their current
recommendations is write files out of Chip Architect, use Floorplan Manager
for optimization, and import back into Chip Architect. The same goes for
budgeting and PrimeTime. Why!? I guess they need to protect their revenue
stream for Primetime and Floorplan Manager, but I for one (of course I am
not writing this check) would be happy to see Chip Architect take a PT orFM
license and not force me to have write & read back in 100's of MBs of files.
Anyway, I hope this info helps someone else who is out trying to make tool
decisions. Despite my gripes, I really like the tool and plan on using it
on all new projects. Although we haven't taped out anything with Chip
Architect yet, "Curly" is on the fast track to go and is definitely moving
faster than it would be without something like Chip Architect.
And I should mention, John, that whenever I have run into the inevitable
problems, the local (Boston) apps. and corporate engineering support for
Chip Architect has been excellent.
- Jon Stahl, Principal Engineer
Avici Systems N. Billerica, MA
( ESNUG 338 Item 2 ) --------------------------------------------- [12/3/99]
From: John Cooley <jcooley@world.std.com>
Subject: Stupid Gotcha Found While Installing Simucad's Ver 99.1 Silos III
After reading the install notes, I figured it would be worthwhile to update
my Silos III licence on my laptop to the new version 99.1 of the tool. And
after wasting a good hour chasing this, e-mailing that, reading this, I
discovered during the stage where have you edit the licence file to put
in:
FEATURE Sse simucad 99.101 permanent uncounted 12A8B1B56A42 \
HOSTID=SIMUCAD=601037 ck=97
It's REALLY important you type in "99.101" even though all over the Silos
documentation they talk about version "99.1". Yes, it's minor, but I
figure if this saves 1,000 engineers an hour's hassle, it's worth
publishing.
- John Cooley
the ESNUG guy
( ESNUG 338 Item 3 ) --------------------------------------------- [12/3/99]
Subject: ( ESNUG 335 #9 ) The Verilog++ Discussion & Co-design's Superlog
> You mentioned a future Verilog++. Well, isn't that what the guys at
> Co-Design are thinking about with their Superlog language? It's a
> definite HW development language, but with some of the features out of
> C/C++. I don't have any experince personally w/ the language. But I've
> listened to the sales pitches of all the companies mentioned in your
> article as well as from Co-Design. Co-Design seems to have thought a
> little bit more closely about HW design issues, such as synthesis, etc.
> Right now you can synthesize Superlog only through a translation into
> Verilog, and I have no idea how that would work quality wise.
>
> - Anna Ekstrandh
> Extreme Packet Devices Inc. Ottawa, Canada
From: David Kelf <davek@co-design.com>
Hi John,
I see in ESNUG all this discussion about using C/C++ as a hardware design
language and thought you might want to see what Superlog looks like.
The Superlog language is based on Verilog but adds a bunch of additional
systems and C like constructs, sort of a Verilog++. So far we are getting
positive feedback from the people we talk to, but are always interested in
other opinions. Here is a chunk of Superlog code to provide a feel of how
it looks:
state {S0, S1, S2} cstate; // state variable with enumeration
always @(posedge reset)
transition (cstate) default: ->> S0; endtransition
always @(posedge clk iff !reset)
transition (cstate)
S0:if (inp == 0) ->>S2; // change state
S2:if (inp == 1) ->> S1; else ->> S0;
S1: ->> S0 n = treeFind("shergar", root);
endtransition
typedef struct {string s; ref node left, right;} node;
ref node n, root; // global data - pointers to nodes
int visited = 0; // global data - number of nodes visited
function ref node treeFind(string str, ref node parent);
if (parent == null) return null;
visited++;
if (str == parent->s) return parent; // string compare
if (str < parent->s) return treeFind(str, parent->left);
else return treeFind(str, parent->right); // recursion
endfunction
This shows how we have brought C and Verilog together, although it's a
little light on some of the interesting systems constructs that are also
present in the language.
Do you have any feedback on the approach we are taking? Do you think your
readers are interested in this approach?
- Dave Kelf, VP Marketingdroid
Co-Design Automation, Inc. Melrose, MA
( ESNUG 338 Item 4 ) --------------------------------------------- [12/3/99]
Subject: ( ESNUG 337 #1 ) ICCAD'99 Gurus Say "Flat P&R Is Where It's At!"
> I know it's very common for some companies to do layout as a process on
> one big flat design. We considered flat, but these five "hells" came up:
>
> - big flat designs are run-time hell
> - big flat designs are extraction hell
> - big flat designs are back-annotation hell
> - big flat designs are clock tree hell
> - big flat designs are timing closure hell
>
> In practical terms, with engineers here running around tweaking & pumping
> netlists out of Design Compiler every day, some way to compartmentalize
> their work is a MUST. So, we chose the hierarchical approach.
>
> - Sam Appleton
> SGI Moutain View, CA
From: Pat Eichenseer <patrick.eichenseer@amd.com>
Hi John,
I wanted to relate a few items concerning this discussion about hierarchacal
vs. flat P&R.
First of all, in a tutorial at the recent ICCAD confernece, Andrew Kahng and
Majid Saffrafzadeh conclude that flat is the future. They say that the
physical hierarchy created has no relation to high-quality in regards to
timing and routability. Furthermore, they claim that the physical hierarchy
created is somewhat artifical and that the core region is naturally
homogeneous, i.e., leave things flat. Additionally, the logic hierarchy is
function driven, while the physical world is minimum wire length driven.
Thus, by imposing physical hierarchy, which is typically based upon logic
partitioning, you have forced the placement engines to work on a subset of
the problem, rather than on the entire problem.
One other observation that I have made is that hierarchical designs consume
more die area. This is mostly due to over designing of the power/ground
topology.
On the pro-side for physical hierarchy, everyone knows that by having
physical hierarchy, blocks can be divided up into design teams. This also
allows for blocks to be worked on at different stages. Addtionally, block
budgets can be assigned early accounting for the top-level interconnect.
Another benefit is engineering change orders (ECOs) are much easier to
implement on a block rather than the entire design.
> But as to doing flat layout on a 750k instance design - really? I am
> mystified by a couple of statements
>
> "Sam could have intelligently partitioned his 750 K instance design
> into four parts by hand and Avanti could take it from there."
>
> How do you "intelligently partition" a design into 4 pieces in the tool?
> Does it magically fit together? Does Avanti support this? Cadence
> doesn't. Do you arbitrary break the design along a global placement, or
> do you BREAK THE DESIGN BY LOGICAL HIERACHY? If you break the design by
> logical hierachy, then it certainly sounds like you're doing hierachial
> physical design, and you'll have to do pin assignment, interconnect
> routing etc between the pieces.
>
> - Sam Appleton
> SGI Moutain View, CA
Now, in regards to Sam Appleton, and by the way I do not want to get into
a flame war here, I have pros and cons for hierarchical and flat. As for
750K instances flat, sure this should doable for Avanti as I am placing
80K cells with 83 pre-placed macros in 9 minutes (I have done much larger
designs in reasonable times, I just don't have the numbers). If Avanti
can't do it, then Silicon Perspective can. As for doing the design
hierarchically, yes you would partition the design by logical hierarchy,
and do pin assignment and then route the blocks together. There's no
magic to it. As for Sam's issues, clock insertion is handled via a
bottom-up approach, i.e., you CTS the blocks first, then CTS at the
top-level, though I am aware, Avanti is working on a top-down CTS
algorithm. As for the 2D estimates, I agree with Sam that the estimates
use to be way off, but with recent integration of part of Avanti's
extraction engine. I have found reasonable correlation with final
extraction.
As for signal integrity, this mostly has to do with obtaining a good
floorplan with very good pin assignments. This sounds trivial, but it can
make or break a design. With larger die repeaters are required. You can
let Avanti's Jupiter program instantiate them, or you can implement buffer
block planning as suggested by Jason Cong, Tianming Kong & David Zhiang Pan
in their 'Buffer Block Planning for Interconnect-Driven Floorplanning'
paper. Finally, as for extraction, Star-RCXT. We are seeing excellent
run times for huge flat designs.
- Pat Eichenseer
AMD Austin, TX
( ESNUG 338 Item 5 ) --------------------------------------------- [12/3/99]
Subject: ( ESNUG 337 #1 ) Customer Q&A On The 1999 Dataquest ASIC Survey
> Our 1999 survey of hardware engineers in the U.S. market for ASIC design
> breaks out as:
>
> 24 % ######################## less than 100 k gates
> 22 % ###################### 100 to 249 k gates
> 4 % #### 250 to 499 k gates
> 16 % ################ 500 to 999 k gates
> 22 % ###################### 1,000 to 2,499 k gates
> 4 % #### 2,500 to 4,999 k gates
> 4 % #### 5,000 to 10,000 k gates
> 2 % ## 10,000 + k gates
>
> Worldwide, there are about 10,000 ASIC design starts.
>
> - Gary Smith, Senior EDA Analyst
> Dataquest San Jose, CA
From: Nick Summerville <nsummerv@ford.com>
To: Gary Smith <gary.smith@gartner.com>
Gary,
What constitutes a gate? When I ask around the industry, I get radically
different answers, from 'a gate is a transistor', to 'a gate is a
placeable standard cell'. Most people (based on what I am told at
conferences) take the number of transistors and divide by either 4 or 6,
and call this the number of gates.
What is your study based on?
- Nick Summerville
Ford Microelectronics Colorado Springs, CO
---- ---- ---- ---- ---- ---- ----
From: Gary Smith <gary.smith@gartner.com>
To: Nick Summerville <nsummerv@ford.com>
Nick,
It's amazing home much of design work is rule of thumb. IMI came up with
the standard CMOS gate in 1974. That was based on four transistors. Now
there are a lot of ways to actually make a gate, but four has stood up as
the rule of thumb. That means you take the number of transistors and
divide by four. These numbers are part of my yearly EDA User Wants & Needs
Report, which is a based on a statistically correct sample of all North
America IC, ASIC, FPGA, PCB, and System Design Engineers. Been doing them
for years now.
- Gary Smith, Senior EDA Analyst
Dataquest San Jose, CA
---- ---- ---- ---- ---- ---- ----
From: Randy Schmidt <rschmidt@us.ibm.com>
To: Gary Smith <gary.smith@gartner.com>
Gary,
What is the typical method used to translate between "movable objects" and
"kgates" for an average ASIC design? How do you account for arrays in
kgates?
- Randy Schmidt
IBM Rochester, MN
---- ---- ---- ---- ---- ---- ----
From: Gary Smith <gary.smith@gartner.com>
To: Randy Schmidt <rschmidt@us.ibm.com>
Randy,
If you mean flip-flops, I always used eight gates as the rule of thumb.
Kgates is a thousand gates so what we're seeing is 2% of the designs are
over 10,000,000 gates. When you translate transistors, when you actually
aren't doing gate level design, into gates I use four transistors per gate.
- Gary Smith, Senior EDA Analyst
Dataquest San Jose, CA
( ESNUG 338 Item 6 ) --------------------------------------------- [12/3/99]
Subject: ( ESNUG 335 #2 ) 3-D Load-Dependent Lookup Tables In DC/PT 99.10
> I'd like to rehash an old issue now that it's possible to actually use it.
>
> Synopsys version 1999.10 supports 3-dimensional lookup tables for delay
> modeling. I hope to find out what other tools (e.g., Avanti, Cadence,
> etc.) already have or plan to have compatible timing modeling. What have
> you heard from your vendors? Are there characterization tools out there
> that can write .lib's with this syntax?
>
> Also, the Synopsys docs note that 3D modeling is useful for cells with
> load-dependent outputs, such as unbuffered flops or adders. My local
> Synopsys rep, however, related a customer complaint that even buffered
> outputs exhibit a slight degree of dependence on the other output's
> loading and that the inaccuracies increase with finer geometry processing.
> I'd like to ask the ESNUG readers to please share their experiences with
> this issue and enlighten us.
>
> - Andy Pagones
> Motorola Labs
From: [ The Synopsys Library Compiler CAE ]
Hi John,
I'm the Library Compiler CAE. Since Andy brought this up, I would like to
give a brief history of 3-D timing modeling plus some recommendations.
We've had a lot of experience at Synopsys modeling unbuffered outputs.
In the early days of DC, we modeled one output dependent on output loading
another output, but no output-to-output timing arcs. Many people use this
today, though fewer and fewer libraries heavily leverage unbuffered outputs.
A few years ago we added setup & hold constraint 3-D tables. Some vendors
requested them and 3-D constraint tables are generally used when we have
load dependant constraints. Setup/Hold depends on loads on both Q and Qbar.
During the time when we added 3-D constraint tables, we also added 3-D power
tables for mutually dependent outputs. This was driven by Power tool needs.
Many ASIC vendors use this today. 3-D tables for power should only be used
when one output is complement of other output since table/computation is
based on just one toggle rate.
With DC/PT 99.10 we added 3-D output tables for mutually dependent outputs.
Our main goal was to enable easier SDF back annotation in DC and PrimeTime,
by replacing output-to-output delay arcs, by 3-D input-to-output. SDF has
no notion of output-to-output delay arcs, so there was always some black
magic needed to back-annotate these load dependent arcs in past.
We are aware that even buffered outputs exhibit a slight degree of
dependence on the other output's loading and that the inaccuracies increase
with finer geometry processing. Some ASIC vendors have delay calculation
tools which can handle buffered outputs. Their libraries have characterized
data for 3-D modeling for buffered outputs. (I would welcome customers who
have used such libraries to share their experiences on ESNUG! We don't
recommend modeling buffered output interactions unless effects are fairly
significant. The gate level timing calculation is already an abstractionof
the transistor world. We use the basic assumption of a single toggle case.
This state dependency gives more details but without considering the real
loading/slew on pins outside the signal path!) The 3-D table is
significantly different from the 2-D table using a non-buffered output.
It trades off performance (3-D calculation is more than a degree of
magnitude complex than 2-D) for accuracy. A buffered output might not
justify the cost.
- [ The Synopsys Library Compiler CAE ]
( ESNUG 338 Item 7 ) --------------------------------------------- [12/3/99]
Subject: ( ESNUG 337 #10 ) Yes!, I Found A SpeedChart Y2K FSPD Daemon!
> Some of you may still use the excellent, but dead & unsupported,
> SpeedChart Tool Environment on SunOS/Solaris and HP-UX. Actually we
> are trying to fix the Y2K problem found in the license daemon by a new
> daemon, which would replace the old fspd but still provide the same
> amount of functionality after the 31.12.1999. Any help appreciated.
>
> - Markus Meng
> Meng Engineering Baden, Germany
From: "Markus Meng" <meng.engineering@bluewin.ch>
Hi, John,
Thought I'd follow up on this. It turns out that you can get the Y2K daemon
patch for SpeedChart 3.4.X by sending an e-mail to "info@scentech.ch" with
a Subject: "More Info on Speed2000 Y2K Patch". For web users, go to my
site at http://mypage.bluewin.ch/speed2000 -- this page contains the Y2K
ready fspd license daemon for Solaris 2.5.
- Markus Meng
Meng Engineering Baden, Germany
( ESNUG 338 Item 8 ) --------------------------------------------- [12/3/99]
Subject: ( ESNUG 337 #3 ) X's, Optimism, Pessimism, Resets, & Verilog Case
> It is possible to code the RTL in a style which would intercept and
> process X-states more accurately, moderating both the pessimism and the
> optimism. For example, our case statement could be modified to intercept
> X-states and propagate their affect on the result more accurately.
>
> reg [1:0] d, e;
> ...
> begin
> case (d)
> 2'b00 : e = 2'b01;
> 2'b0X : e = 2'bX1;
> 2'b01 : e = 2'b11;
> 2'bX0 : e = 2'bXX;
> 2'bXX : e = 2'bXX;
> 2'bX1 : e = 2'bXX;
> 2'b10 : e = 2'b10;
> 2'b1X : e = 2'bX0;
> 2'b11 : e = 2'b00;
> endcase
> end
>
> This coding style quickly becomes unmanageable for all but the simplest
> examples and is prone to errors (incompleteness). In addition, we are
> very quickly loosing the designer's clear functional intent to support an
> irritating consequence of RTL X-state simulation.
>
> - Harry Foster
> Hewlett-Packard Computer Technology Lab Richardson, TX
From: Lauren Carlson <carlson@starbridgetech.com>
Hi, John,
Here are two coding examples that attempt to overcome Verilog optimism with
respect to propagation of X's through case statements in simulation. We
were burned by not finding a reset problem until late in the schedule when
our gate level simulation propagated the X correctly.
These case statements are easier to read and use than Harry's examples in
.ESNUG 337 #3.
reg [1:0] d, e;
...
begin
case (d)
2'b00: e = 2'b01;
2'b01: e = 2'b11;
2'b10: e = 2'b10;
default:
begin
// synopsys translate_off
if ((|d[1:0]) === 1'bx))
e = 2'bxx;
else
// synopsys translate_on
e = 2'b00;
end
endcase
For a fully specified case (no default statement) one could go as far as
coding it specifically to propagate the X's to find reset problems.
reg [1:0] d, e;
...
begin
case (d)
2'b00: e = 2'b01;
2'b01: e = 2'b11;
2'b10: e = 2'b10;
2'b11: e = 2'b00;
// synopsys translate_off=09
default: e = 2'bxx;
// synopsys translate_on
endcase
Our design team is currently using this methodology to find more
intialization problems earlier.
- Lauren Carlson
StarBridge Technologies, Inc. Marlboro, MA
( ESNUG 338 Item 9 ) --------------------------------------------- [12/3/99]
Subject: ( ESNUG 335 #9 ) OK, C/C++ HW Design Won't Work; Why Not SLDL?
> My belief is that trying to embed everything in one language is just part
> of the eternal quest for the silver bullet. Fred Brooks wrote an IEEE
> Computer article on silver bullets (with a cool werewolf picture) many
> years ago. One obvious issue that is not mentioned is that the software
> and hardware developers are rarely the same person. We don't expect the
> circuit and layout designers to use the same tools. What is needed are
> tools appropriate to each problem domain, but which enable integration,
> co-design, co-simulation, co-verification, etc. One language for
> everything is another way of saying lowest common denominator.
>
> - Duncan M. (Hank) Walker
> Texas A&M University College Station, TX
From: Steve Schulz <ses@ti.com>
Hi John,
The result of 5 industry workshops (each one comprised of 30-50 leading
experts in the field worldwide) arrived at the same conclusion as Duncan
Walker states above, based on quite rigorous technical and usability
analysis. A similar analogy could be made with software systems. One
could write a full GUI-driven application entirely in assembly (or
low-level C), or one could write a device driver out of perl and a few
extra foreign interface bindings. Yet we use a blend of assembly, C, C++,
Perl, Tcl/Tk, MS-Excel Visual Basic macros, etc. The reason? It's the old
adage: "use the right tool for the right job". Once we truly understand
the greater complexity of the "100M-xtrs-and-embedded-sw-on-0.1u" design
task, it is clearer to see why we should not expect C or C++ to solve the
entire problem standalone.
That is not to say that C/C++/Java won't have a useful role. The trick is
that we need to find ways to describe, analyze, and "prototype" all aspects
of these systems before they are actually committed to expensive silicon.
A means to integrate the views, and different syntaxes, is needed, and must
be interoperable for the industry to leverage from any of them. The System
Level Design Language (SLDL) effort is now at work defining such an
integration system. I expect that we will see (one or more variations on
C/C++ as an integral part of the SLDL environment with Rosetta, but it is
not realistic to expect any single software language to "be all things".
(see http://www.inmet.com/SLDL for more background.)
- Steve Schulz
SLDL Initiative Founder
Texas Instruments Dallas, TX
|
|