From: Bob Alverson <bob@cray.com>
Hey, John,
I just had to share with you that the Synopsys R&D Q&A in ESNUG 392 #5
was classified by our e-mail filters here at Cray as spam. The kicker
is it claims the Synopsys people were writing about penis enlargement!
X-Cray-SpamScore: 4.2
X-Cray-SpamSigns: PENIS_ENLARGE2
What triggered it was:
47. I've noticed that the Design Compiler budgeter uses 8G of
memory on a 400 K design, versus 1G with the PrimeTime
budgeter. Is this normal?
No. Design Compiler budgeter offers many advantages for synthesis
(faster, accuracy, RTL budgeting, and so on). However, it might
require more memory for some designs. For large designs a 50 percent
increase in memory might be normal, but an 8x increase is definitely
not normal and should be reported.
Our spam detector is primative; the two uses of the word "increase" in
the same line is assumed to be talking about penis enlargement. But it
also wouldn't surprise me to discover that Synopsys Marketing was using
subliminal techniques to sell more copies of DC. :)
- Bob Alverson
Cray Computers Seattle, WA
( ESNUG 393 Subjects ) ------------------------------------------- [04/25/02]
Item 1: Do You Prefer Synchronous or Asynchronous Resets In Your Designs?
Item 2: Does Your ASIC Vendor Make You Time For A 12% On-Chip Variation?
Item 3: ( ESNUG 390 #6 ) Ericsson Didn't Do A Synplicity ASIC Benchmark!
Item 4: ( ESNUG 387 #16 ) Missing #1 Delays In VCS Will Burn You In Debug
Item 5: Is Cadence Going To Buy Get2Chips And Scrap Its Ambit/BuildGates?
Item 6: Newbie Question -- How To Do I Ungroup Automatically Inside DC?
Item 7: ( ESNUG 387 #4 ) PhysOpt Runtimes Are Shorter With "insert_scan"
Item 8: ( ESNUG 386 #15 ) Veritools Has A Novas DeBussy At 1/4 The Price
Item 9: A User Tape-out Of PhysOpt w/ The New Synopsys Clock Tree Compiler
Item 10: Three Engineers On Why Apollo Timing Differs From PrimeTime Timing
Item 11: ( ESNUG 388 #19 ) My 31% Speed-up By Hand-Tweaking DW Arithmatic
Item 12: LEF & Verilog Won't Do; Hard Macros In PKS Need A TLF Or ALF File
Item 13: ( ESNUG 388 #2 ) Handle General & External Obstructions In PhysOpt
The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com
( ESNUG 393 Item 1 ) --------------------------------------------- [04/25/02]
From: Jerry Yang <yangj@nortelnetworks.com>
Subject: Do You Prefer Synchronous or Asynchronous Resets In Your Designs?
Hi John,
We are starting a new ASIC design and had new member coming from another
group. When discussing methodology, our new collegue suggested using a
synchronous reset flip-flop over asynchrous flip-flop which was a shock
to us. Our past design guideline has been using async reset flops only
(because all ASIC vendors we know about only have async reset flops in
their libraries, so if you choose to code a sync reset flop the
Synthesis tool will try to build some logic in front of your D pin).
Do your readers have any suggestions on this?
- Jerry Yang
Nortel Networks Nepean, Ontario, Canada
( ESNUG 393 Item 2 ) --------------------------------------------- [04/25/02]
From: Paul Zimmer <pzimmer@cisco.com>
Subject: Does Your ASIC Vendor Make You Time For A 12% On-Chip Variation?
Hi, John,
Our current ASIC vendor is having us do timing analysis with an on-chip
variation of 12%, applied to both cell delays and interconnect delays.
This makes certain kinds of timing very difficult to pass. For example,
using a PLL to zero out a 5 ns insertion delay requires a 5ns feedback
path. But 12% variation between the two is 600 ps, which is a big chunk
of time these days. Source-synchronous interfaces have similar problems.
How realistic is this sort of thing? Has anyone done a paper on this?
- Paul Zimmer
Cisco Systems
( ESNUG 393 Item 3 ) --------------------------------------------- [04/25/02]
Subject: ( ESNUG 390 #6 ) Ericsson Didn't Do A Synplicity ASIC Benchmark!
> Although my own stock in Synplicity got temporarily hammered when your
> latest article about them was published ...
>
> - Bill Cox
> VI ASIC
From: Gayatri Japa <gayatri.japa@indiatimes.com>
Dear John,
Mike Dini advertizes his services on the Synplicity web page. Bill Cox
admits he owns Synplicity stock. You should screen your emails better.
- Gayatri Japa
India Times
---- ---- ---- ---- ---- ---- ----
From: Mike Djamoos <MDjamoos@WhiteRockNetworks.com>
Hi, John,
We use Synopsys Design Compiler for ASIC synthesis. While Synplicity has
done a great job of addressing the specific needs of the FPGA community,
we decided to stay with the tool with which we are most familiar when
designing ASICs.
- Mike D'Jamoos
White Rock Networks
---- ---- ---- ---- ---- ---- ----
> One of our designs is highly parameterized using generic and loop
> statements. We also use 3-dimensional arrays. The Synopsys software
> needs more than 30 hours for the synthesis! In contrast the Synplify
> ASIC software needs only 40 minutes and achieves results of the same
> quality than the Synopsys software. The setup of the Synopsys
> synthesis environment took more than a week. The results from Synplify
> ASIC were achieved within a day.
>
> So a clear statement from my side is that Synplicity software is far
> more easy to handle and more efficient than the Synposys software!
>
> - Juergen Dennerlein
> Ericsson Eurolab Deutschland GmbH Nuremberg, Germany
From: Juergen Dennerlein <Juergen.Dennerlein@eed.ericsson.se>
Hi John,
After having talked to several designers it seems to me that my e-mail in
ESNUG 390 #6 seems to be misunderstood. The word "benchmark" I used implies
too much. Better would have been to say "we had some interesting results
after quick tests" because we didn't perform a benchmark. With respect to
the runtimes we did experience 30 hours with Design Compiler. The 40
minute Synplify ASIC runtime was reported to us by the Synplicity Munich
office. We provided Synplicity the RTL code and they reported us back the
runtimes. They reported they achieved timing within a day! Up to now we
haven't checked the netlist results yet.
When I wrote I didn't know that Synopsys already had taken action to tackle
the problems we encountered with our parameterized design. It was about 12
days ago when I got the info that Synopsys already had brought down the
synthesis run time to an acceptible value of 5 hours by coding style changes
and usage of special DC attributes. Furthermore I didn't know that Synopsys
already had commited them to fix these problems in the next Design Compiler
release. Nonetheless, I would like to point out that it's not acceptible
that designers have to adapt to a tool specific coding style or that they have
to know such special constraints. An easy to handle tool should take over
those tasks in order to free the designer for real design work!
I hope this email clarifies what people were questioning.
- Juergen Dennerlein
Ericsson Eurolab Deutschland GmbH Nuremberg, Germany
[ Editor's Note: I want to thank Juergen for setting the record straight
here. In my book, it's NOT a benchmark when the Synplicity people run
the tool and report back their "results". This is not an anti-
Synplicity sentiment. I distrust *any* benchmark data that comes from
*any* EDA vendor. I've been lied to too many times. In re-reading
ESNUG 390 #6, I also noticed that Frank de Bont of Arcobel had some
benchmark numbers in his letter. I will be investigating if Arcobel
had actually used a copy of Synplify ASIC in house or not. - John ]
( ESNUG 393 Item 4 ) --------------------------------------------- [04/25/02]
Subject: ( ESNUG 387 #16 ) Missing #1 Delays In VCS Will Burn You In Debug
> Blocking assigns with delays are not recommended:
>
> always @(posedge clk)
> a = #1 b;
>
> Blocking assignments do not ensure proper ordering of events in daisy-
> chained flip-flops, so they require a #1 on the RHS to avoid race
> conditions. Since this inhibits VCS cycle-based optimizations, this
> coding style is also not recommended.
>
> - Mark Warren
> Synopsys, Inc. Cupertino, CA
From: Darren Jones <dj@mips.com>
Hi John,
In ESNUG 387 #16, Mark Warren claims that the following flip-flop
coding style uses an unnecessary #1 delay:
always @(posedge clk)
a <= #1 b;
If you do not include the #1 delay above, then you are relying on the
intra-timetick event ordering rules of Verilog. The problem is that
the PLI interface does not guarantee these intra-timetick ordering
rules. Thus, if you use PLI to model circuit behavior, you may have
latent race conditions. (This applies to VMC models, too)
In addition, using the #1 does in fact avoid races when you have code
which may not always use non-blocking assignments for flops. This can
happen if a designer accidentally uses a blocking assignment, or if
you are trying to do mixed gate+RTL simulations, or if you are using
code not produced by your team - for instance compiled RAM models, 3rd
party BIST controllers, IP cores, etc.
Furthermore, if you eliminate all #1 delays on flops and then have a
race condition, it is notoriously difficult to debug. Just turning on
waveform dumping may cause the simulation to pass...
If you don't use PLI, and have bug-free RTL and all of it follows the
given style recommendations, the #1 is not needed. However, using #1
does give your design some degree of resiliency to the many coding
styles in use today. I do not dispute that VCS may run faster, but it
doesn't help me to have a fast simulation that doesn't work. :)
- Darren Jones
MIPS Technologies Mountain View, CA
( ESNUG 393 Item 5 ) --------------------------------------------- [04/25/02]
From: Kusuma Arkalgud <karkalgud@silverbacksystems.com>
Subject: Is Cadence Going To Buy Get2Chips And Scrap Its Ambit/BuildGates?
Hi John,
I just heard about a seminar called 'Emerging Technologies Seminar'. It's
being hosted by Silicon Perspective, Get2Chip, Plato, and Verplex. (Have a
look at http://www.siperspective.com/seminar/). Why would SPC and Plato
(now owned by Cadence) work with Get2Chip on a seminar? Does this mean that
Cadence's Ambit/BuildGates is being de-emphasized or end-of-lifed? Is this
a sign that Cadence is thinking about buying Get2Chip? I'd appreciate
insights from you and other ESNUGers.
- Kusuma Arkalgud
Silverback Systems, Inc.
( ESNUG 393 Item 6 ) --------------------------------------------- [04/25/02]
Subject: Newbie Question -- How To Do I Ungroup Automatically Inside DC?
> After technology mapping my design ends up with some additional cells
> (e.g. cell name: add_210, reference name: Pipeline_DW01_inc_8_0). "DW01"
> is always part of the reference name but the cell names are random.
>
> Finding these cells via their reference name is no problem: find
> reference *DW*. But how to ungroup these cells (ungroup does only
> accept cell names as argument)? How to derive the cell name from the
> reference name? Any hints? Thanks in advance!
>
> - Uwe Stange
> University of Heidelberg Germany
From: Ansgar Bambynek <a.bambynek@avm.de>
Hi Uwe, how about this?
foreach (design_name, find (design,"*")) {
current_design = design_name
dw_cell_list = filter (find(cell,"*"), "@is_dw_subblock == true")
if (dc_shell_status != {} ) {
ungroup -flatten dw_cell_list -simple
}
}
Additionaly an "ungroup -all" ungroups everything including DW components.
- Ansgar Bambynek
AVM Germany
( ESNUG 393 Item 7 ) --------------------------------------------- [04/25/02]
Subject: ( ESNUG 387 #4 ) PhysOpt Runtimes Are Shorter With "insert_scan"
> So far we have not seen any such issues in-house or from other
> customers using "insert_dft -physical" concerning long run-times
> to fix DRCs or larger gate counts (as compared to using a combination
> of "insert_scan -physical" and "physopt -inc -eco" commands.)
>
> - Andrew Copper
> Synopsys, Inc. Mountain View, CA
From: Neel Das <neel.das@corrent.com>
Hello John,
this has been a somewhat long-standing discussion for us with Synopsys. I
first posted on ESNUG 385 #11 regarding long compile times with insert_dft,
wherein a PhysOpt run that followed it was taking significantly longer
than after insert_scan. Synopsys replied in ESNUG 387 #4 that they were
unable to replicate this. Subsequently, Synopsys sent me a series of
switches to try out. Based on their inputs, I set up and ran three
experiments:
Experiment #1:
insert_dft -physical -map_effort low
physopt -incremental -eco
Results:
a. Runtime(insertX) 20 min
b. Runtime(physopt) 4 hours 4 min
c. Orig/Final cellcounts from physopt : 23947/33164
d. Orig/Final Densities from physopt 21.1/42.2
e. Worst setup violation in 'clk' domain -1.35
f. Worst hold violation in 'clk' domain -0.04
Experiment #2:
insert_scan -physical -map_effort low
physopt -incremental -eco
Results:
a. Runtime(insertX) 8 min
b. Runtime(physopt) 15 min
c. Orig/Final cellcounts from physopt : 23416/26819
d. Orig/Final Densities from physopt 20.5/29.8
e. Worst setup violation in 'clk' domain NONE
f. Worst hold violation in 'clk' domain -0.05
Experiment #3:
insert_dft -physical -ignore_compiler_design_rules \
-dont_fix_constraint_violations
physopt -incremental -eco
Results:
a. Runtime(insertX) 15 min
b. Runtime(physopt) 3 hours 5 min
c. Orig/Final cellcounts from physopt : 23814/33105
d. Orig/Final Densities from physopt 20.8/42.1
e. Worst setup violation in 'clk' domain -2.11
f. Worst hold violation in 'clk' domain -0.03
We're seeing significantly better runtimes and overall QOR with the
'old' insert_scan approach. Once I sent in my results to Synopsys,
they've responded with more switches:
You may want to do the following before running insert_dft -phy
physopt
set_scan_element false <non-scan-flops>
set_dont_touch <non-scan-flops>
check_dft
report_test -state (the state reported should be test-ready)
insert_dft -physcial
physopt -inc -eco
Unfortunately, I haven't had the time or the need to try these additional
switches, since based on my results, I see no reason to switch from a
perfectly usable and faster flow!
We've submitted this block as a testcase to Synopsys, and they've been
able to replicate the long runtime 'effect' with insert_dft. I'll
certainly keep you posted if there are further developments, but as far
as I'm concerned, for now, it's insert_scan!!
- Neel Das
Corrent Corp. Tempe, AZ
( ESNUG 393 Item 8 ) --------------------------------------------- [04/25/02]
Subject: ( ESNUG 386 #15 ) Veritools Has A Novas DeBussy At 1/4 The Price
> My company has been using Debussy from Novas ( http://www.novas.com ) for
> a couple of years now. The tool can trace through both RTL and gate
> netlists (we use Verilog, by the way). For gates, the tool can draw
> schematics which can be probed; nets can be expanded to show fan-in
> and/or fan-out cones. For RTL, the tool draws reasonably good schematics
> and finite state machine diagrams. The tool also has nice hooks into its
> waveform viewer. (This is not a paid advertisement, I swear!)
>
> I have heard that Undertow by Veritools offers similar capabilities, and
> it may be cheaper. ( http://www.veritools-web.com/products.htm )
>
> - Gene Sullivan
> Analog Devices, Inc.
From: Robert Schopmeyer <schop@veritools.com>
Hi, John,
Undertow Suite can do exactly what Novas does and for less than 1/4 the
cost. What a totally unnecessary and major financial burden it is to have
to purchase tools for every engineering team member that cost $25,000+,
just in order to trace signals back on your RTL or gate designs.
Undertow Suite will allow users to trace either gate or RTL schematics
backward or forward, and allow the user to expand the schematic with either
show all drivers and loads or show just the drivers and loads you are
interested in. The schematic window shows you the schematic at all
hierarchical levels instead of being limited just to a single scope. This
allows the user to see on a single schematic view, the name changes as the
signal goes into or out of each hierarchical level. A new no cost feature
of the shortly to be released Undertow 9 Suite, is the ability to view your
RTL design graphically in even lower level RTL primitives such as Muxs, FFs,
Latches, Priority Encoders and Decoders, and including the graphical display
of your completely decomposed expression trees.
Undertow Suite also comes with a wave form viewer and state diagram
viewer to use along with your source code window and schematic window.
Users can get the Undertow Suite for UNIX, Linux or the PC from
http://www.veritools-web.com and run these tools with no cost licenses
for evaluation.
- Bob Schopmeyer
Veritools, Inc.
( ESNUG 393 Item 9 ) --------------------------------------------- [04/25/02]
From: Jens Michelsen <jcm@vitesse.com>
Subject: A User Tape-out Of PhysOpt w/ The New Synopsys Clock Tree Compiler
Hi, John,
Before we used PhysOpt, we had a traditional Synopsys frontend to Avanti
backend COT flow. We still use VCS, TetraMax, Design Compiler, Formality,
PrimeTime (and now PrimeTime-SI) from Synopsys and all the Avanti tools.
We use Avanti Star-RC for extraction while our ASIC vendor does the
LVS/DRC checks.
Our main problem has been the iterations between gate-level netlists and
P&R. It was taking too long and becoming more difficult to achieve timing
closure. Our inserted clocks caused a lot of uncertainty before P&R. The
clock skew margins we had to use in our pre-placed-and-routed netlist
hindered our ability to optimize for area and power as well.
When we brought in PhysOpt, we also signed up to be an early evaluator of
their Clock Tree Compiler tool. We set up our PhysOpt / Clock Tree Compiler
flow to be fully hierarchical. Every top-level block is run through PhysOpt
and Clock Tree Compiler and then Avanti Planet is our block level floor
planner after that. We then took the resulting placement directly to Apollo
to complete the block level routing. The timing after routing had good
correlation to pre-routing estimates. No routability problems came up.
For the top-level, we then used the top level netlist together with the
extracted block-level models, which were taken through PhysOpt for top level
optimization and clock tree insertion. This new flow had a significant
upside over our traditional flow; our results were now predicable and
deterministic.
While we were implementing this new flow, we were asked to help on a block
from another group that was having timing closure problems. Their block was
part of a SoC being developed in Datacom Vitesse. The block consisted of
60 K instances of logic with 5 memories, and was targeted for TSMC 0.18 um
7-layer metal. The biggest issues were timing closure in the presence of
its complex clock tree, design congestion and the routability of the design
after back-end place and route. The RTL, area and port locations were fixed
and couldn't be changed. In addition the block had low utilization (35%)
due to the fixed area constraint. All the other flows within Vitesse failed
to get closure on this problem block. They gave us 2 weeks to get it
through PhysOpt / Clock Tree Compiler and tape-out.
Clock Tree Description:
- 4 sub clock trees driven by a top-level clock (the top level clock
is also driving 8800 FF's) plus 5 reset trees and one scan mode tree.
- Clocks specification 2.0 ns, 4.5 ns and 5.5 ns periods with 10%
uncertainty
Here is what we got.
Clock tree # of FF Latency(ns) Buffers Levels Skew (ps)
----------- ------- ----------- ------- ------ ---------
Sub clock 1 2200 1.3 600 6 55
Sub clock 2 320 0.8 24 2 40
Sub clock 3 5000 2.4 340 7 200
Sub clock 4 175 0.7 12 2 15
Top clock 8800 2.1 560 7 200
Reset 1 2200 1.2 120 3 60
Reset 2 320 0.8 18 2 20
Reset 3 5000 1.3 275 3 75
Reset 4 170 0.6 14 2 12
Top reset 8800 2.0 599 7 326
Scan Mode 16400 2.1 916 6 160
In 5 days we taped out and met our clock skew spec on this block with room
to spare.
For our next tape-out we are hoping to include Power Compiler within the
flow, and hopefully to reduce the number of routing iterations required to
achieve timing closure. We also need to include signal integrity effects
and process antenna rules as part of the overall placement process.
Overall we were pleased with the introduction of the new Synopsys physical
synthesis, placement and clock tree synthesis tools into our COT flow.
- Jens Michelsen
Vitesse Denmark
[ Editor's Note: The scripts Jens used with Clock Tree Compiler are
in the "Downloads" section of http://www.DeepChip.com - John ]
( ESNUG 393 Item 10 ) -------------------------------------------- [04/25/02]
Subject: Three Engineers On Why Apollo Timing Differs From PrimeTime Timing
> I don't understand why the slack report that my ASIC vendor gives me via
> Apollo are different from what I get after analyzing it by Synopsys
> PrimeTime? I mean, Apollo does have all the information, how can it give
> results that are less accurate than PrimeTime?
>
> - Nahum Barnea
From: Lars Rzymianowicz <larsrzy@ti.uni-mannheim.de>
Well, if you have done a pre-layout synthesis with wire load models, your
synthesis tool is only guessing the length (and capacitance) of nets.
This can differ a lot from the actual placement/routing of the design.
"I mean, Apollo does have all the information, how can it give
results that are less accurate than PrimeTime?"
It's the other way 'round. Apollo's results are more accurate, since
it has all the logical and physical information. PrimeTime only knows the
netlist, not the placement.
That's the problem of timing closure the EDA industry is attacking with
tools for physical synthesis.
I've seen pre/post layout mismatches on timing of 300% on our latest design.
With the standard ASIC flow: DC with WLMs, Apollo with length-driven P&R.
The flow is simply outdated for today's technologies...
- Lars Rzymianowicz
University of Mannheim Mannheim, Germany
---- ---- ---- ---- ---- ---- ----
From: jcooley@TheWorld.com (John Cooley)
There's two reasons for this, Nahum. The first reason is that you're
probably using PrimeTime post-synthesis but pre-P&R (i.e. with front-
annotated delays), while Apollo is giving you timing reports post-P&R.
Front-annotated delays are essentially "best guesses" based on your block
size (if you're just using Design Compiler). You can get better "best
guesses" early on in your design cycle if you use physical synthesis tools
like PhysOpt/Magma/PKS -- but there will still some timing differences
even using these tools because they use only placement info; not final
legal P&R.
The other reason why you're seeing timing differences between PrimeTime
and Apollo is that they use different timing algorithms within the tools
themselves. That is, even if you take post-P&R Apollo generated netlists
and time them in PrimeTime, you're going to find a small difference in
the timing delays each tool reports. (This difference is usually under
3%, so it's not something to lose sleep over, Nahum.)
- John Cooley
the ESNUG guy
---- ---- ---- ---- ---- ---- ----
From: [ The Man With One Red Shoe ]
John,
Keep me anonymous, please, if you decide to run this in ESNUG.
The major reasons for timing differences between Apollo and PrimeTime, in
addition to the ones that you described in your post, are:
1. Is the design latch or flop based? Flop based designs are trivial
from a timing analysis perspective and the differences between Apollo
and Primetime should be under 5% max (assuming that the same
parasitics etc are used).
On the other hand if the design is latch based, the devil that is
time-borrowing enters the picture. PrimeTime being a "pure" timing
analysis tool, does physical/greedy borrowing. On the other hand
from a design implementation tool's perspective such as Apollo, you
need to do some sort of time-borrow balancing to ensure that the tool
is not wasting cycles trying to optimize the snowballed slack at the
hard timing endpoint (output port or a flop data pin).
2. Propagation of constants across sequentials. Apollo HAD a "feature"
that would propagate constants across sequentials, while PrimeTime
does not. This could mean that a path reported by PrimeTime, is
not "seen" by Apollo.
3. Worst case slew propagation. PrimeTime used the worst case input slope
when reporting timing, not the actual slope. For example, if you have
a two input AND gate with inputs A and B and output O. For a moment
assume that the transition times at pins A and B are 5 and 10 units
respectively. PrimeTime used to (now they have a switch to control the
behaviour) always use the "10" units of trans time, when timing through
pin "A". So the timing analysis is kind of pessimistic. Apollo, I
think, uses the "5" units trans time for paths through pin "A" and "10"
for paths through "B". (I have some very preliminary indications that
the latest Apollo might have the same behaviour, but I have not fully
investigated this.)
4. For Apollo timing to match PrimeTime, a couple of other parameters
enter the picture. The Apollo timer does not use the ECS/Effective cap
model for computing cell delays (the old story of resistive shielding
etc) by default. For it to match PrimeTime, ensure that the "ntECSOn"
is set appropriately (the actual command might be slightly different.)
5. I think that Apollo uses AWE for interconnect delay while PrimeTime
uses something like Adaptive Arnoldi or something like that...
One last issue that a lot of designers tend to over look is characterization
range. For example if you are operating your cell outside of its
characterization limits, all bets are off. In general its, NOT a safe
assumption that the extrapolation is always linear.
I have seen Apollo behave kind of counter-intutively.
- [ The Man With One Red Shoe ]
( ESNUG 393 Item 11 ) -------------------------------------------- [04/25/02]
Subject: ( ESNUG 388 #19 ) My 31% Speed-up By Hand-Tweaking DW Arithmatic
> Here's a simple datapath example and different timing/area results with
> different flows.
>
> module mux4 (m0,m1,x,b0,b1,z);
>
> parameter n=32;
>
> input [n-1:0] m0,m1,x;
> input [2*n-1:0] b0,b1;
> output [n:0] z ;
>
> wire [2*n:0] y1;
> wire [2*n:0] y0;
> wire [2*n:0] y2;
>
> assign y0 = m0*x + b0;
> assign y1 = m1*x + b1;
>
> assign y2 = (y1>y0) ? y1 : y0;
> assign z = y2[2*n:n] + y1[(n-1):0];
>
> endmodule
>
>
> All of these have been achieved using TSMC's 0.13 um technology and the
> 2001.08-SP2 release of DC.
>
> Flow Path Length Path Slack Design Area Compile Time
> -------------------- ----------- ---------- ----------- ------------
> DC-Expert + DW_Standard 13.74 -7.24 240760.28 2745.29
> DC-Expert + DW 7.33 -0.83 213677.06 3161.75
> DC-Ultra + DW + MCI + TCSA 6.63 -0.13 210615.31 2754.35
> DC-Ultra + DW + MCI + PD 6.50 0.00 174409.92 952.84
>
> DW_Standard is the standard library shipped with DC. DW is the full
> DesignWare library. "DW + MCI + TCSA" means DesignWare, with
> dw_prefer_mc_inside set to true and transform_csa command. "DW + MCI
> + PD" means DesignWare, with dw_prefer_mc_inside set to true and
> partition_dp command
>
> - Oliver Meisel
> Synopsys, Inc. Mountain View, CA
From: [ Papa Smurf ]
John, anon pls.
I have quite a bit of experience building arithmetic units for graphics
chips, mainframe processors and DSPs. I ran Oliver Meisel's example using
a "DC-Ultra + DW + MCI + TCSA flow" and found that it could meet 6.0 nsec
using a similar library which also targets the TSMC's 0.13 micron process.
I set max_fanout to 20, ungrouped all and set the operating conditions to
worst case military. I used -map_effort high and max_area 0. I'm using
2001.08-SP1. (Be sure to use SP1 or later if your using the transform_csa
command as *bad* logic can result otherwise.)
I then synthesized a version which instantiated hand optimized multipliers
and adders. I call these results in the chart below "RTL-1". This design
made 5.5 nsec and was smaller than the DW implementations.
To further improve performance I reorganized/re-architected the code,
duplicating an adder so that the magnitude compare and last addition were
performed in parallel. These results are listed as "RTL-2" in my chart.
I went from:
assign y2 = (y1>y0) ? y1 : y0;
assign z = y2[2*n:n] + y1[(n-1):0];
to:
assign y3 = y0[2*n:n] + y1[(n-1):0];
assign y4 = y1[2*n:n] + y1[(n-1):0];
assign z = (y1>y0) ? y4 : y3;
This design (also using my hand optimized multiply-accumulators and
adders) achieved a 4.5 nsec timing.
Flow Path Length Area
-------------------------- ----------- -------
DC-Ultra + DW + MCI + TCSA (original RTL) 6.0 ns 150,805
DC-Expert "RTL-1" (hand optimized arith) 5.5 ns 118,097
DC-Ultra "RTL-2" DW + MCI + TCSA (re-arch) 5.0 ns 157,753
DC-Expert "RTL-2" (re-arch) (hand op arith) 4.5 ns 129,329
Note that the big gains are from architectural changes. The transform_csa
command is a powerful architectural tool which saves significant area and
delay. My hand optimized multipliers and adders also utilized a carry-save
architecture and saved about 0.5 ns and significant area over the DW
implementation, but reorganizing the code had an even larger impact. I went
from Oliver's 6.0 nsec down to 4.5 nsec overall. That's a 25% speed-up.
All of these runs met the path length timing constraint listed.
- [ Papa Smurf ]
( ESNUG 393 Item 12 ) -------------------------------------------- [04/25/02]
Subject: LEF & Verilog Won't Do; Hard Macros In PKS Need A TLF Or ALF File
> I'm trying to use PKS to synthesize and P&R a semi-custom design. I have
> datapath hard-macros which I have created LEF abstracts of. The control
> logic is in Verilog. The design hierarchy is also in Verilog. What I
> want to do is to manually place all the hard-macros and then do a flat
> synthesis/P&R from the top-level. Has anybody done this before and have
> any insight into this?
>
> One problem I'm getting is port ordering. When a macro is instantiated in
> Verilog, how does it know which port is which? I tried created a dummy
> Verilog block for the datapath macro with only the ports defined. Didn't
> work. I think it thought the block was empty. Will port connection
> by name solve the problem?
>
> PKS doesn't seem to be doing anything with the LEF files I'm importing.
> It keeps complaining that there is no physical information about the
> blocks. Am I doing something wrong?
>
> - Albert Ma
> MIT Cambridge, MA
From: Christopher Van Beek <cv74215@attbi.com>
I'm unemployed and haven't touched PKS is months, but here is what I
remember. By the way, if anyone is hiring in the Portland, Oregon area,
let me know.
I think the problem you are having is during the technology mapping phase.
Do you get warnings about block boxes being created, or that the design
has black boxes?
The only time I have had port ordering problems is when I let Ambit/PKS
create a block box for a macro and then saved the database (.db). Then in
a new session, I loaded the real library file (.alf) for the hard-macros
and reloaded the saved database. This gave me errors. The pin order for
the auto-created library abstract did not match the "real" one I loaded
later.
I think all you have to do is read in the library abstract (.alf) for the
hard-macro when you first synthesize. I used to read in the standard cell
library and then a bunch of abstracts for RAMs before any synthesizing.
Also, make sure you set the global variable which tells which libraries to
use during technology mapping to include all the libraries. You should not
get any warnings about black boxes.
There might be ways to auto-generate an alf/lib file in SE, but I have never
done that. I know our RAM compiler would spit out a .lib file which
'libcompile'-d to .alf quite nicely. There might be a way to do it with
.tlf files, but I have not done that, either. If you are stuck, I guess you
could manually create a .lib file. I don't think the timing info is
required, unless you want to do timing analysis in PKS or do timing driven
placement.
To control the placement of the hard-macros, you just have to pre-place them
in the floorplan DEF file you read into PKS. The only trick is to make sure
the instance names match exactly what the names will be when qplace is run.
It is easy to get caught double instantiating these cells if you change the
flattening of the design (ie. design has "top/middle/bottom/cell_instA" and
DEF file has "top_middle_bottom_cell_instA" - these don't match, so qplace
will place another instance and ignore the pre-placed one.)
- Christopher Van Beek Portland, OR
---- ---- ---- ---- ---- ---- ----
From: Albert Ma <ama@cag.lcs.mit.edu>
Thanks Chris,
Yup, I get black box warnings (when I don't read in a Verilog shell). So
PKS will not be happy until it gets a tlf or alf? I guess that's the
problem. I've been trying to get away without doing that. I was hoping
that it would intuit the info from a DEF and/or Verilog shell.
We have most of the Synopsys, EPIC, and Cadence tools. Anybody know if
there's a tool in there to autogenerate .lib or .tlf?
- Albert Ma
MIT Cambridge, MA
---- ---- ---- ---- ---- ---- ----
> One problem I'm getting is port ordering. When a macro is instantiated in
> verilog, how does it know which port is which? I tried created a dummy
> verilog block for the datapath macro with only the ports defined. This
> didn't work. I think it thought the block was empty. Will port
> connection by name solve the problem?
From: Robert Szczygiel <Robert.Szczygiel@cern.ch>
The instantiatied macro conectivity can be defined in two ways:
1.) By port order
nand n1 (net1,net2,net3);
The connectivity is determined by the port order. If the nand is
defined:
module nand(y,a,b)
then net1 is connected to y, net2 -> a, net3->b.
2.) By port name
nand n2 (.a(net1),.b(net2),.y(net3));
This is explicit, and does not depend on the port sequence in the
module definition.
Usually the tool which generates the netlist has some switches to choose
between the two modes. I have never used PKS, but Silicon Ensemble needs
to have an empty Verilog models for the macro blocks (and TLF, of course.)
- Robert Szczygiel
CERN Switzerland
( ESNUG 393 Item 13 ) -------------------------------------------- [04/25/02]
Subject: ( ESNUG 388 #2 ) Handle General & External Obstructions In PhysOpt
> The tone of my postings has been to gather as much useful information as
> possible directly related to PhysOpt 'obstructions' and put it in one
> place (or a few postings) in ESNUG.
>
> - Cyrus Malek
> Synopsys, Inc. Austin, TX
From: "Cyrus Malek" <cyrusm@synopsys.com>
Hi John,
OK, this should round out my notes on obstructions in PhysOpt. Today's note
will cover the final two categories: General and External Obstructions. Let
us begin...
General Obstruction Objects
General Obstructions
--------------------
General obstructions (non-PNET and non-fixed-cell) can be created with the
command:
create_obstruction
[-name cluster_name] [-parent cluster]
[-layer layer_name] [-placement]
-coordinate {X1 Y1 X2 Y2}
In PhysOpt 2001.08, we completed the triplet of obstruction-manipulating
commands by adding:
report_obstruction
remove_obstruction
so any obstructions that are created with the create_obstruction command
can subsequently be reported and or removed within the shell... a great
tool for experimenting with minor floorplan modifications WITHOUT having
to go back to a floorplanner.
In addition, obstructions can be read in through PDEF, such as
(DISTANCE_UNIT 1.000000)
(LAYER_DEF
(LAYER "METAL1" 10)
(LAYER "METAL2" 12)
(LAYER "METAL3" 13)
)
(CLUSTER "psyn_obs_place"
(OBSTRUCTION 0)
(RECT 0.00 0.00 50.00 50.00)
(X_BOUNDS 0.00 50.00)
(Y_BOUNDS 0.00 50.00)
)
(CLUSTER "psyn_obs_route"
(OBSTRUCTION 10)
(RECT 50.00 50.00 100.00 100.00)
(X_BOUNDS 50.00 100.00)
(Y_BOUNDS 50.00 100.00)
)
In the above PDEF example, two obstructions have been defined, one on layer
ID 0 (undefined layer) with coordinates {0 0 50 50}, and one on layer ID 10
(METAL1 layer) with coordinates {50 50 100 100}. When an undefined layer
is used to create an obstruction, that obstruction will be come a PLACEMENT
obstruction. It will affect the placement of cells, however it will NOT
affect the congestion or delay calculations (due to nets crossing the
obstruction). When a previously defined layer (as show above in the
LAYER_DEF section) is specified in an obstruction, that obstruction will
reside solely on the specified layer. This type of (layer-specific)
obstruction WILL affect delay and congestion calculations since routing
tracks on the specified layer have been obstructed.
To illustrate the above, let's look at a pathalogical case, with some
fuzzy math:
Suppose your design uses a 3-layer metal technology, and suppose
(as is typically the case) your physical technology library [.pdb]
has all 3 layers specified.
Metals 1 and 3 route vertically, while metal 2 routes horizonally - all
have same width and spacing. Further, suppose your integration
team has decreed that all block designers are limited to just
metal 1 and 2! Metal 3 is reserved for top-level routing and/or power.
Since PhysOpt sees all 3 metal layers in the technology library, it will
assume it can utilize all 3 layers for timing & congestion calculations.
To make your design adhere to the integration team's decree, a full
obstruction should be placed on metal 3, either via the floorplan
PDEF or with PhysOpt's create_obstruction command.
Suppose this design is highly congested - say it uses 120% of all metal
1 and metal 2 routing tracks (which means it needs 20% more metal).
If the metal 3 obstruction IS created before PC is run, then PhysOpt will
'see' this congestion, and appropriately determine that the design is
120% utilized for routing. PhysOpt's physical viewer and the
report_congestion command can be used to analyze this congestion. If
the obstruction is NOT created before PhysOpt is run, then PhysOpt will
'see' 33% more routing tracks than it really has (it now has 3 full
layers instead of 2). To make the fuzzy math simple, if every layer has
100 routing tracks, PhysOpt now things it has 300 routing tracks instead
of 200. Remember that the design requred 20% of metal 3 on top of all of
metal 1 and 2, so therefore the grand total comes out to (1.2 * 200) =
240 required tracks. The congestion that PhysOpt would see would only
be 240/300 = 80% !
I hope the above case illustrates why it is important to understand *ALL*
constraints on a design, both logical and physical, as well as understanding
PhysOpt's default behavior with respect to these constraints.
General Routing Obstructions
----------------------------
To create a routing obstruction, use either the create_obstruction -layer
command or read in a PDEF that contains the obstruction declared on a
valid routing layer ID. Valid routing layer IDs are defined in the
"LAYER_DEF" section of the PDEF file, typically near the beginning of
the file.
In the PDEF example earlier, the obstruction named "psyn_obs_route" is
treated as a routing obstruction since the OBSTRUCTION construct declares
a valid layer (layer id 10 = METAL1, according to the LAYER_DEF section).
The following command can be used to create this exact routing obstruction
within Physical Compiler:
create_obstruction -name psyn_obs_route -layer METAL1 \
-coordinate {50 50 100 100}
This obstruction will be factored in during congestion analysis and net
delay calculations.
The physopt_pnet_*_blockage_layer_names have NO effect cell placement with
respect to these these general routing obstructions -- they are strictly
routing obstructions.
General Placement Obstructions
------------------------------
General placement obstructions can be created either by reading in a PDEF
that contains the placement obstruction (an obstruction declared on a
non-existant layer ID) or using the create_placement command.
In the PDEF example above, the obstruction named "psyn_obs_place" is
treated as a placement obstruction since the OBSTRUCTION construct
declares an invalid layer (0 = undeclared, according to the LAYER_DEF
section). The following command can be used to create this exact
placement obstruction within Physical Compiler:
create_obstruction -name psyn_obs_route -placement \
-coordinate {0 0 50 50}
Any placement site that is even partially covered by this obstruction will
be treated as an obstructed site, where PC will not place a cell.
I want to reiterate: this obstruction will NOT be factored in during
congestion analysis and net delay calculations since it does not affect
routing layer availability.
External Obstruction Objects
External Obstructions
---------------------
Obstructions can also occur in cover macros. I have seen users specifying
cover macros to either define obstructions that pre-exist at a higher level
of the physical hierarchy, or to define reserved regions for future routing
at the top level. While today PhysOpt does not automatically create and
write out physical net shapes that could short with the layer obstructions
in the cover macro, it WILL take these 'external' layer obstructions into
account when calculating congestion and timing. Be careful when using cover
macros, there are some clearly defined caveats in the LEF/DEF Reference
manual.
A sample cover macro instantiation in DEF format looks like:
COMPONENTS 1;
- instance_cover_macro_1 cover_libcell_reference + COVER ( -90 -90 ) N ;
END COMPONENTS
The library cell that this instantiation calls out must be defined in a
linked physical library. Here is an LEF example :
SITE COVER
CLASS PAD ;
SIZE 0.9 BY 0.9 ;
END COVER
MACRO cover_libcell_reference
CLASS COVER ;
FOREIGN cover_libcell_reference -0.9 -0.9 ;
ORIGIN 0.9 0.9 ;
SIZE 1468.8 BY 1342.8 ;
SITE COVER ;
OBS
LAYER OVERLAP ;
RECT -0.9 -0.9 0.9 0.9 ;
LAYER m5 ;
RECT 182.95 535.75 196.85 538.85 ;
END cover_libceell_reference
The corresponding code in .plib format would be:
resource ( "std_cell" ) {
site ( "COVER" ) {
site_class : pad;
size ( 0.900, 0.900 );
} /* end site */
} /* end resource */
macro ( "cover_libcell_reference" ) {
cell_type : cover;
source : user;
in_site : COVER;
foreign ( "cover_libcell_reference" ) {
origin ( -0.900, -0.900 );
} /* end foreign */
origin ( 0.900, 0.900 );
size ( 1468.800, 1342.800 );
obs () {
geometry ( "OVERLAP" ) {
rectangle ( -0.900, -0.900, 0.900, 0.900 );
}
geometry ( "m5" ) {
rectangle ( 182.950, 535.750, 196.850, 538.850 );
}
}
}
Both routing and placement obstructions can exist in cover macros (among
other things). These obstructions are treated just like the general
obstructions defined in the previous sections.
- Cyrus Malek
Synopsys, Inc. Austin, TX
( ESNUG 393 Networking Section ) --------------------------------- [04/25/02]
Petaluma, CA - Calix Networks, a pre-IPO start up seeks an ASIC Manager.
Requires Synopys, PrimeTime, Apollo and/or Saturn. "thang.le@calix.com"
============================================================================
Trying to figure out a Synopsys bug? Want to hear how 13,958 other users
dealt with it? Then join the E-Mail Synopsys Users Group (ESNUG)!
!!! "It's not a BUG, jcooley@TheWorld.com
/o o\ / it's a FEATURE!" (508) 429-4357
( > )
\ - / - John Cooley, EDA & ASIC Design Consultant in Synopsys,
_] [_ Verilog, VHDL and numerous Design Methodologies.
Holliston Poor Farm, P.O. Box 6222, Holliston, MA 01746-6222
Legal Disclaimer: "As always, anything said here is only opinion."
The complete, searchable ESNUG Archive Site is at http://www.DeepChip.com
|
|