Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS


( SNUG 03 Item 17 ) ---------------------------------------------- [05/14/03]


Subject: PhysOpt, DC, PhysOpt-MPC, Magma, Monterey

NOT SWAPPING, BUT ADDING ON INSTEAD:  One of the weird things I've found in
this survey is that practically all of the PhysOpt users are still also DC
users.  That is, users are not swapping out DC licenses for PhysOpt licenses
as Synopsys marketing said they would; instead it's quite common to use both
DC *and* PhysOpt differently on different blocks in the same chip.  This is
because a PhysOpt RTL-to-placed-gates run chows *serious* runtime (as in 5
day runs sometimes) while a DC-to-gates-then-to-placed-gates-in-PhysOpt is
many times the quick & dirty technique that gets the easier blocks out the
door in hours instead.  To put demographics on this, 55% of PhysOpt users
do a DC-to-gates-to-placed-gates, 23% do pure RTL-to-placed-gates type of
PhysOpt runs, and the remaining 22% do a mix of both styles.  In terms of
licenses, this means a minimum of 77% of PhysOpt users are also required to
be DC users, too!  I don't care what Synopsys marketing may tell you about
physical synthesis; it looks like DC ain't going to be going away any time
soon even if the entire EDA buying public goes 100% PhysOpt.


     Dataquest FY 2001 ASIC Physical Synthesis Market (in $ Millions)

                 Synopsys/Avanti  ##################### $41.2 (41%)
                           Magma  ################ $31.1 (31%)
                         Cadence  ############ $24.1 (24%)
                        Monterey  ## $4.0 (4%)


    "We use the PhysOpt gates-to-placed-gates options.  The biggest reason
     is speed.  With wireloads and DC, we get a good beginning and we have
     no problems with licenses.  We have lots of DC licenses so I can run
     jobs in parallel.  Unfortunately, since PhysOpt is expensive, there
     aren't as many licenses and this would require me to run job serially.
     During the design phase there is just too many things to do to run
     things in series."

         - Mark Weaver of Texas Instruments


    "The best

     - I could verify that despite everybody claims to close timing with
       PhysOpt, nobody has ever close timing 100% with PhysOpt. 
     - The Synopsys presenter for CTS in PhysOpt-Expert.  Finally a
       knowledgeable person who did answer all the tough questions.
 
     The worst

     - Uncertainty continues on what it's going to happen to Synopsys'
       backend flow.  Not clear they even know themselves.
     - A Synopsys consultant creating 22 placement regions in a mostly
       square floorplan to guide PhysOpt out of congestion.  Wow.
     - The PhysOpt tutorials had poor experiments in them.  They reminded
       me of the famous "cockroach experiments": A scientist placed a
       bunch of cockroaches on the table.  Hit the table with his flat
       hand, and observed all cockroaches running away.  He removed all
       the legs of every cockroach, placed them again on the table and hit
       the table with his hand.  No cockroach moved.  Conclusion: When
       cockroaches are deprived of their legs they become "deaf"!
     - New PhysOpt 2003.03 doesn't have scan-chain un-stitch / re-stitch
       commands.  It doesn't seem to have improved in memory management
       either.

     We use gates-to-placed-gates.  Even for small chips, PhysOpt is slow
     doing RTL-to-placed-gates.  We can synthesize a small chip in about 1.5
     hours with 2 DC licenses.  It takes much longer if we feed the design
     to PhysOpt.  We don't even consider feeding a million-gates design to
     PhysOpt.  I tried once to do a top-level optimization, but I decided to
     abort it after 2 hours in the 1st phase.

     Memory usage seems to be a bit of a problem with PhysOpt.  Trying to do
     many things usually results in very slow performance, or
     "out-of-memory" problems.

     Methodology is important.  A good front-end synthesis methodology
     (with DC), renders excellent results in gates-to-placed-gates.  Once
     designers understand the specific requirements of Physical Design,
     they can generate netlists of excellent quality, that run very fast
     through PhysOpt meeting post-placement timing.  "Never trust a tool
     with a man's job".  This is especially important for meeting schedule.
     If you trust PhysOpt to meet timing in RTL-to-placed-gates you may
     find, very late in the design flow, that PhysOpt will not be able to
     fix those architectural problems that nobody thought about because
     they were expecting PhysOpt to do the job..."

         - Santiago Fernandez-Gomez of Pixim, Inc.


    "We run PhysOpt only gates to placed gates."

         - [ An Anon Engineer ]


    "We have been a PhysOpt user for about 3 years with several tape-outs
     (0.35um, 0.25um & 0.18um).  We've always used a gates-to-placed-gates
     flow due to a combination of poor performance in PhysOpt when trying
     RTL-to-placed-gates and Synopsys' inept DFT solutions.  Recently we
     have been focused on moving to Astro from SE as our mainstream router
     so we haven't tried an RTL-to-placed-gates flow for the last 8 months
     (it is something we will look at over the next 6 months), however, I
     am truely skeptical about the quality of the results."

         - [ An Anon Engineer ]


    "I'm in the buffet camp, rather than the a la carte one.  At Corrent,
     we found that on some designs a zero-wire-load DC synthesis followed by
     a gates-to-placed-gates (g2pg) PhysOpt run produced better results.
     However, we found a few cases (notably the largest blocks) where
     PhysOpt's RTL-to-placed-gates (r2pg) produced a faster turnaround time.
     Also, contrary to Synopsys' claims, I've come across a couple of
     cases where (for 400 K+ instance designs) DC was not able to run to
     completion.  There definitely seems to be some capacity issue.  For
     those of our designs that have macros, we use an r2pg run to arrive at
     macro placement suggestions.  Subsequently we clean up the macro
     placements by hand and restart an r2pg run.  Except, of course, if you
     have a *large* number of macros.  Then, you run the risk of the PhysOpt
     macro placer timing out!  I may have a paper on this at the Boston SNUG
     this year.  For physical feasibility (size/utilization/congestion), we
     found the r2pg flow to work better.  For timing feasibility, we found
     nothing really beats a g2pg flow!

     Bottom line, the best method is the old, tried-and-tested method of
     experimenting until something works!

     Surely you were not expecting a push-button answer.  :)"

         - Neel Das of Corrent Corp.


    "A year ago we taped out a 25 M transistor design (0.25 um) with some
     15 blocks at top level, all but one went through PhysOpt.  About half
     were synthesized with an RTL-to-placed-gates script and the other
     half with a gates-to-placed-gates script.

     My original plan was to run all blocks through RTL-to-placed-gates
     but eventually it wasn't always the best choice. 

     One block was too large to be synthesized in RTL-to-placed-gates mode
     in a reasonable amount of time (it took 5 days on a SunBlade 1000). 
     Note that the PhysOpt block size issue had more to do with the number
     of designs in the block hierarchy once uniquified than with the raw
     cell count (about 210K instances; a 160K instances block went through
     RTL-to-placed-gates in 36 hours).

     Other blocks that didn't make it successfully with the RTL-to-pg script
     had high utilization (and hence high congestion) issues.  The RTL-to-pg
     placed netlist was just unroutable and we had to iterate over
     RTL-to-pg (using custom WLM extracted from the PhysOpt run, so that
     DC would have "better" estimates and could do a better job of area
     recovery -- so as to lower the utilization ratio in the PhysOpt step)
     and gates-to-placed-gates in high-congestion mode, which eventually
     yielded a routable design.

     It is worth noting though, that if the RTL-to-placed-gates flow had an
     issue with congested designs, it always yielded better timing results
     (not by a large margin, mind you: but it was systematically better),
     which is why the blocks that went through RTL-to-placed-gates without
     hitch eventually did make it to the tapeout."

         - [ An Anon Engineer ]


    "Session 3 - User presentations
     
     13:30 - 14:15 - Fixing Hold violations in DSM Era
     
     Describing several experiments from 0.18 um down to 90 nm on hold-time
     fixing, and hold-time analysis.  Interesting graphs.  He leaves many
     issues unexplained.  It seems that most of their flow is script-based
     (as opposed to tool-based).
     
     Q. 11 iterations to close hold. What is an iteration?
     A. Manual iterations (ECO)
     Q. Did they use RC corners?
     A. Yes
     Q. Have you checked area vs using a 150 ps margin for hold-time?
     A. No answer.
     Q. Clock skew number: min or max corner?  Clock skew for modules with
        different number of flops?
     Q. OCV.  Breaks setup (multicyles now...).
     A. It's not clear what they did here.  It seems to me that they just
        tweak everything manually
     Q. What do you use to fix hold-time?  Not PhysOpt...
     A. Manually
     
     14:45 - 15:00 - RTL to Layout Implementation of OMAP Platform
     
     Full Physical Design methodology for an embedded core in a big ASIC.
     All steps described.  Basically DC, PhysOpt, Astro, then backannotation
     to PhysOpt.  Very conservative synthesis, in two passes.  1st pass
     synthesis.  2nd pass flatten netlist, tighter constraints.
     
     Q. PhysOpt didn't close timing.  How did you close timing?  Did you
        ever close timing with PhysOpt in any chip?
     A. Manual Timing ECOs.  They claim it's not manual because they have
        scripts (ha!)  The presenter has taped-out 9 large chips as a
        Synopsys consultant using PhysOpt.  He never fully closed timing
        with PhysOpt.  Timing ECOs were always required.
     Q. They had to create 22 regions to guide placement.  That's looks like
        a lot of work.  How many iterations until they got a clean
        placement?  How many days?
     A. They claim this is part of the "pipe cleaning" part of the flow.
        So it takes time, but they don't count it as real time.
     Q. IO Buffer manually? What's the tool for then?
     A. They claim this is not manually, but done by DC.
     Q. Develop methodology for clock tree? The tool doesn't work?
     A. No answer.

     15:00 - 15:45 - Logic and Physical Synthesis Methodology for a
     VLIW/SIMD DSP Core
     
     History of the development of their DSP core, from specs to physical
     design scripts.  They provide scripts to customers.  The spec DSP
     processor has to be able to be implemented in silicon, so they had to
     involved the PD group from the beginning, to make sure that any thing
     the compiler spits out is going to be feasible in silicon.
     
     Q. 2-3 days turnaround time.  PhysOpt 12-16 hours.  How big is the
        design?
     A. 200 K instances (definitely they have some problem with their
        scripts... our chip is 250K and turnaround time is 16 hours)
     Q. Did you run routing? Any feedback from silicon?
     A. Yes they did. And CTS too.
     Q. How do you solve formal verification between spec and RTL?
     A. No solution."

         - Santiago Fernandez-Gomez of Pixim, Inc.


    "PhysOpt is good, but Magma tools are definitely catching up quickly.
     They provide a lot of capabilities that are very desirable and PhysOpt
     doesn't have them yet.  Being able to work with one tool and one
     Volcano database to resolve everything is extremely desirable to have."

         - [ An Anon Engineer ]


    "I've used RTL-to-placed-gates, but for no particular reason.  I'm quite
     new to the physical synthesis world, since I've only recently moved
     back on to design work after spending a few years in chip-level TA.
     I've only been using PhysOpt (mainly in MPC mode) to provide more
     accurate timing results, since the real layout of my block is being
     done by another team in Astro.

     I've found PhysOpt to be a bit unstable (lots of the infamous Synopsys
     fatal errors), and I gave up trying to work out the cause of a problem
     which stopped the initial placement from working properly; I just went
     back to good old DC."

         - David Smith of STMicroelectronics


    "We do PhysOpt in gates-to-placed-gates mode only.  Our Synopsys AE
     told us not to attempt RTL-to-placed-gates.  Too slow.  He hasn't
     lied to us so far."

         - [ An Anon Engineer ]


    "We use PhysOpt RTL-to-placed gates because of an analysis I did a few
     years ago on some of our blocks.  It showed that the PhysOpt runtimes
     were worse than going through DC then PhysOpt gate-to-gate, but I could
     get better results in some cases.  I have not gone back and redone the
     experiments recently.

     From what I remember the results were very RLM dependent so others may
     see different results.  I do remember that results at the time were at
     least as good on all the RLMs."

         - Chris Gorzek of Cray


    "I've done extensive studies into this.  In almost ALL cases going

              RTL (in DC) -> gates -> placed-gates (PhysOpt)

     gives better results.  Far better in many cases.  This is counter-
     intuitive.  Synopsys originally told me to use the RTL -> placed gates
     flow for PhysOpt.  Soon after, as I kept getting poor results, they
     changed their recommendations.  They have observed from MANY customers
     that RTL (DC) -> gates -> placed gates (PhysOpt) almost always gives the
     best results.  I've heard this from many Synopsys FAEs.
 
     I have found some cases where the latest 2003.03 with the Presto reader
     out performs the RTL (DC) -> gates -> placed-gates (PhysOpt) placed
     gates flow.  When this works, it does just a percent or two better than
     the RTL (DC) -> gates -> placed gates (PhysOpt)."

         - [ An Anon Engineer ]


    "My last chip we used gates-to-placed-gates and it worked relatively
     painlessly.  Much easier timing closure compared to just DC days.
     However, we didn't stress the density at all.  The next release coming
     up we are going to try RTL-to-placed-gates.  This makes sense because
     we are going to pack more gates in the exact same bounding box.  I use
     a team in India to do most of the backend.  I think the reason we
     haven't used RTL-to-place-gates before is because they seem to be very
     conservative on adoption of new tools.  If it closes timing I guess I
     don't care.  In a couple months I can tell you how the story ends for
     this release.  However, I assume eventually we will be going PhysOpt
     from RTL-to-placed-gates always.

         - Layne Lisenbee of Texas Instruments


    "I've had really good success w/ SoC Encounter.  Synopsys tried PhysOpt
     on our exact same design, and the results took way longer, and were
     worse.  We subsequently taped out 3 more designs with SoC Encounter."

         - [ An Anon Engineer ]


    "I have been using PhysOpt for about two years now.  It seems to do a
     very good job for everything I have thrown at it.  I have always
     preferred the RTL-to-placed-gates.  My reason is that I believe if you
     give the synthesis a solution, gate level structure, it will use that
     and solve the timing but it will not drastically change the structure.
     If it starts from RTL, I believe it has the opportunity to come up with
     a better solution.  I see no reason to do a dc_shell compile and then
     move to PhysOpt, it is a waist of time.  I do like the MPC, but it
     still has some problems when it comes to insert_dft, if new ports
     are created.

     The one thing I have always found strange with PhysOpt is that the
     check_legality command does not view unplaced gates as illegally
     placed.  For some lame reason they think that because the cell is
     not placed that it can't be illegally placed.  I can almost see their
     reason but I still believe that it should report the cells that are
     not placed so that you could issue the commands to fix the problem
     without have to write tcl scripts to solve what they already know.

     I believe that the problem is Synopsys has too many software people and
     not enough designers to know what is important and what is not for
     automation of a design flow."

         - Paul Fletcher of Motorola


    "We use both.  The Synopsys AE's often tell us that RTL2PG gives better
     results than PhysOpt because it starts with the RTL, and that the
     resource allocation step is able to use better estimates of wire delays
     this way.  We have observed some exceptions where PhysOpt gives better
     results.  Although we don't know the cause, our hunch is that the wire
     load models were pessimistic for the netlist that we ran PhysOpt on. 
     For blocks with large memories, the wireloads tend to be pessimistic
     (due to the area of the memory blocks) when DC-only synthesis is run.
     This probably makes DC choose a faster implementation to meet timing
     and when the same block is run through Physopt, we get better results
     than RTL2PG.

     PhysOpt RTL2PG is really limited in capacity, and we can't run it at
     the full-chip level.  PhysOpt, on the other hand, is able to handle
     full chip placement and is the backbone of our company's backend flow.

     One more thing, we have almost stopped using DC-only for synthesis.
     Most of our blocks use compile_physical -mpc (RTL2PG).  We have found
     that this method gives us better estimates for timing when a netlist is
     taken into the backend flow.  It is good to see many timing violations
     upfront after the synthesis process rather than having the backend
     engineers discover these much later in the flow.  We also use some
     limited floorplanning commands available in PhysOpt to set the macro
     orientations, locations, etc. before running MPC."

         - [ An Anon Engineer ]


    "We use gates to placed gates most time.  It gives consistent good
     result in general.  RTL2Gates show good result on some design but not
     all the time.  Another factor trigger this is that our project team is
     structure the way that front end does synthesis and hand netlist to
     backend for placement, not too much effort on RTL2gates front."

         - [ An Anon Engineer ]


    "I've released 5 designs with PhysOpt, all gates-to-placed-gates.  The
     first 4 were because the customer provided the gate-level netlist.
     The last one, though we were given the RTL, was gates-to-placed-gates
     because we needed to utilize ACS for schedule and perfromance reasons.
     If we did a traditional top-down synthesis it would have taken 24-36
     hours (depending on the Linux platform).  ACS reduced the job down to
     8 hours on a dual-CPU machine, and achieved timing closure on a block
     that had issues when compiled with a traditional top-down synthesis
     methodology.

     I'm anxious to try out RTL-to-placed-gates when ACS works with PhysOpt."

         - Mark Johnson of Agilent Technologies


    "Our use of PhysOpt is mostly centered on the gates-to-placed gates."

         - [ An Anon Engineer ]


    "PhysOpt is clearly a gates-to-gates engine.  PhysOpt provides its best
     results with a gate netlist from DC or DC-Ultra.  This can be confirmed
     with a quick run on any DSP module (ARM, MIPS, ZSP)."

         - Benjamin Mbouombouo of LSI Logic


    "I use RTL-to-placed-gates PhysOpt whenever I can.  I have used
     gates-to-placed-gates when absolutely necessary.  But, the results
     I have seen between the two give better congestion, and area with
     the RTL-to-placed-gates method.  The drawback I have seen is with
     RTL-to-placed-gates has a definite capacity limit, and a minimally
     worse timing result.

     I had a design of 90 K instances that would not meet timing or
     synthesize in a "reasonable" amount of time with RTL-to-placed-gates
     in 2001.08-SP2.  I did have other designs up to 75 K instances that
     did just fine with the RTL-to-placed-gates.  When I did use the
     gates-to-placed-gates, Synopsys did not appear to remove all the
     unnecessary logic it created when synthesized using DC with
     the wireload models and over-constrained.  I ended up having to do a
     congestion-based placement with create_placement then performing
     PhysOpt incrementals for timing in order to keep the congestion
     routable using the gates-to-place-gates and the DC netlist!  The
     design was about 75% utilized with a lot of macro's.  I have not
     done the comparison with any of the 2002 versions.  I just stuck with
     RTL-to-placed-gates for our most recent chip."

         - [ An Anon Engineer ]


    "For PhysOpt, only gate-to-placed-gated used in my company.  IMHO,
     RTL-to-placedGate is a concept only by now.  We have tried some
     designs to find the total quality of layout is not as good as we
     expect.  Not to mention there are still scan insertion, ECO issues,
     and runtime issues."

         - [ An Anon Engineer ]


    "2:30 - 3:15 - Accuracy of Timing Predictability using Minimal Physical
                   Constraints (PhysOpt-MPC)
     
     Running PhysOpt with very few constraints to get feeling of timing
     after routing.  They don't even give a floorplan to PhysOpt.  They
     compare a reference design (PhysOpt + floorplan + powerplan) with the
     PhysOpt-MPC run.  Timing comes similar.  Floorplan very different.
     Runtime is shorter for MPC, but not much shorter (10%).
     
     Q. Did you route the design? Is it routable?
     A. No, it's not routable.
     
     3:15 - 16:00 - Re-Shape's Astro-Based High Performance SoC Design
     Environment
     
     Reshape's marketing presentation.  :)
     
     They despise estimation technologies. They are pushing their flow.
     It's amazing how good their flow can be in a set of slides. Of course,
     he didn't say anything about the time it takes to do simple tasks,
     like ECOs, moving a PAD 1 micron, or changing the floorplan...
     
     Wednesday March 19
     
     Session 1 - 9:00 - 12:15 - PhysOpt Highlights in 2003.03
     
     List of bug fixes disguised as "new features"... No much to say here.
     The presenter went through a long list of changes, and an even longer
     list of new variables to tweak/control the behavior of PhysOpt.  See
     the proceedings.
     
     Q. Scan-chain removal and re-stitching
     A. Not yet
     Q. Memory management in PhysOpt. Max design size?
     A. No answer. They say they have improved, but it's not clear
     Q. Buffer tree creation. Max size? runtime?
     A. No answer.
     Q. 20% speed-up. How much is because of better WNS?
     A. No answer.
     Q. remove_high_fanout_nets none/medium/high is not consistent
        notation. It's confusing.
     A. No answer.
     
     Very disappointing first part of the session.  There were plenty of
     interesting questions, but Synopsys was not able to answer most of
     them.
     
     Second part of the session, CTS in PhysOpt Expert.  Much better.  Very
     knowledgeable presenter. Good overview, and good answers to questions.
     Basically CTS does everything CT-Gen does, but within PhysOpt.  It also
     have a few more features.  Unfortunately PhysOpt fails once more in
     efficiency and runtime.  With the numbers Synopsys provide, I estimate
     that CT-Gen is at least 4 times faster.  One person in the audience
     actually points out that they don't use PhysOpt because Astro's CTS
     engine is much faster.
     
     Q&A.  Way too many questions in this session, from me and from the
     audience."

         - Santiago Fernandez-Gomez of Pixim, Inc.


    "We did two designs with PhysOpt last year, both RTL-to-PG.  On first
     go-around, tried G2PG flow for comparison purposes.  Also to work
     around the very amateurish tool bugs, like non-functioning VHDL read-in
     in early 64-bit versions of PhysOpt, intermediate file inconsistency
     w/ the supposedly new and improved Presto Verilog preprocessor, etc.
     Noticed no real difference in performance or otherwise, then again,
     I suspect our designs weren't exceeding the 350 K instance limit
     anyway.  Biggest headaches were with interoperability and PDEF file
     exchange between Cadence and PhysOpt.  Not much of a surprise there."

         - [ An Anon Engineer ]


    "Our small design group has used PhysOpt RTL-to-placed-gates for some
     time now.  We found that it could meet timing and solve route
     congestion on some designs that were just impossible when starting with
     a dc_shell-generated netlist, even with a variety of wireload model
     approches.  I think it's because non-placement aware synthesis makes
     too many bad buffering decisions for the Astro optimization routines
     to overcome.  We found that the post-layout timing was very close to
     what PhysOpt predicted, and that timing closure could then be completed
     inside Astro without too much trouble.  Since the PhysOpt timing is
     reliable, we can try different floorplan approaches and judge them by
     their PhysOpt congestion map and timing reports.  This is considerably
     quicker than place and route iterations."

         - [ An Anon Engineer ]


    "The gates to placed-gates flow worked out very well in our past design.
     We did not use Physopt at the top-level, but PhysOpt seem to do a very
     good job with regard to congestion and transition fixes at the module
     level.  We had no need to fix transition after back-annotation in
     PrimeTime.  More specifically there was a bug in insert_dft -physical
     which tied the output pin of a register to si pins of two registers
     parallely."

         - Dhivakaran Santhanam of Analog Devices


    "We actually use both commands.

         1) compile_physical directly from RTL

     and

         2) PhysOpt for gates-to-gates after compile.

     It depends on the block features.  Timing/placement/run time differ on
     the blocks depending on the flow.  We started out with PhysOpt for all
     blocks on the chip, but compile_physical got better results on some of
     the blocks eventually.  Also flattening or not flattening and at what
     stage to flatten make a big difference on the results.  I would say 50%
     of the blocks use PhysOpt, 50% use compile_physical.
 
     The most important determining factor on what flow to use is the timing
     results.  It is good to try both flows and pick the one that gives the
     best timing (obviously, if it does not break your goals severely.)" 

         - Gun Unsal of Intel


    "Here's my top 3 wishes for PhysOpt / PhysOpt+Astro:

       1) better cap and RC reporting.  This would help both users to find
          outliers, to compare estimators vs. later reality, and to verify
          libraries.  I want a function (TCL or builtin) that puts out
          one long line per net (for sort / grep), with the fields:

              fanout
              total cap, routing cap, couple cap
              net size (bounding box will do)
              nr vias (main RC component in lv130c with 20-ohm vias)
              max RC delay, max transition, max to-pin
              driver celltype
              driver name
              net

          sorted, and filtered with field x > mylimit.  (In Astro we have
          Scheme functions muBigNetInfo / muIONetInfo for {fanout size
          driver net}, and that's useful, but we don't have caps nor RC
          in Scheme.)  Such cap-RC reports can then be plotted and diffed
          through PhysOpt -- Astro phase xx -- SPEF.

          For extra credit, do this for 'nets-with-buffer-trees', too.

       2) separate estimators for Xcap and Gndcap, with multipliers;
          Xcap is noisy and hard to predict, so I want to be able to say
          Xcapestimator *= 1.5 both to reduce Xcap early and to compare
          predicted vs later phases.  (For that matter all the various cap
          and res multipliers need to be cleaned up and verified; changing
          the techfile is not clean.)

        3) (longterm): a glass top into the optimization engine:

                 engine <--> data stream <--> text viewer
                                         <--> graphic viewer.

     To improve any complex multi-objective box, you have to *see* into it
     to understand and tune it.  Consider the data streaming between a
     Ferrari and the pit: lots of data, with lots of people looking at it in
     different ways.  Imagine Ferrari engineers scrolling through megaline
     text files."

         - [ An Anon Engineer ]


    "I both use compile_physical (from RTL to placed gates) and PhysOpt
     (from gates to placed gates).

     They sometimes (depending of design architecture and constraints)
     achieve quite different QoRs in term of run time, area, speed etc.
     Thus I normally run both flows and pick up the best.

     However the gates-to-placed gates flow, is the general preferred way
     when MPC prototyping since one can refine the physical constraints
     (but possibly also timings, area and design rules constraints) and
     start a PhysOpt session, without resinthesizing the whole design
     starting from the RTL.

     Indeed when the design contains a lot of datapath operators or very
     big random logic, the run time saved can be as much as several hours
     for each iteration.  Of course the PhysOpt gate-to-placed gates flow
     requires a gate-netlist, that can be obtained through an RTL-to-gates
     logic synthesis (i.e. compile).  This latter logic synthesis may
     employ different kind of wire load models (standard, custom, not at
     all) depending on design, process, constraints, QoR expected and run
     time.  But this is another story."

         - Marcello Vena of Xignal Technologies


    "We use PhysOpt RTL-to-gates.  It closely correlates pre-routed timing
     with post-routed timing.  However, during crunch times we found that
     there are exceptions which gives us headaches as we have to manually
     upsize certain gates to improve timing."

         - David Fong of S3 Graphics


    "In parallel we also partially evaluated Monterey's tools.  Monterey
     took our design and were close to closing it in approx 4 weeks with
     30 emails/calls taking place.  Monterey also shrank our die size by
     approx 10%.  We were very impressed by this.  Unfortunately we did not
     have time to drive the tools ourselves before starting our next
     project, but we learned enough to see some conceptual benefits in the
     tool suite.

     To be fair most of the Monterey area saving was due to changing our
     conservative power grid (we did not have a power tool at the time; we
     were able to do the same with Apollo when we evaluated AstroRail).

     Monterey tools have a Tcl based command line interpreter, allowing much
     quicker scripting than Scheme.  In addition, their STA commands are the
     same as PrimeTime and the structural querying commands are the same as
     Design Compiler.  You get a familiar Tcl interface for querying the
     Monterey database including the timing results.  You can write scripts
     to query their database and act upon the results.  With the Apollo
     and Astro flow we had to write scripts parsing the timing reports and
     netlists and generating ECOs which obviously adds significantly to the
     turnaround time.

     The Monterey Tcl interface also allows you to write nice DRC (zero
     fanin/fanout etc) scripts to check your netlist within their single
     framework.  The tools run in memory like Astro and you get all the same
     run time benefits from this.  The extraction tool within Monterey's
     tools suite does a full extraction approx 3 minutes (compared to 60
     minutes with Star-RCXT.)

     The only problems we found with Monterey were the speed of the database
     load and the speed and memory efficiency of the GUI.  Both of these
     issues are being addressed in the next release in June.  One of the
     side effects of working with the Monterey rep was that he also solved
     some of our outstanding Apollo issues, too.  Cruel irony.

         - Craig Farnsworth of Cogency Semiconductors

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)