Synopsys Mentor Cadence TSMC GlobalFoundries SNPS MENT CDNS


( ESNUG 408 Item 6 ) -------------------------------------------- [03/13/03]

Subject: ( ESNUG 407 #13 ) Paul Says Power Rings Are An EDA Software Crutch

> Notching power rings: with the old tools (CDN DP) we were using, putting
> down a rectangular power ring around a core was easy; getting the ring
> to have notches around cells in the corner like PLLs took a great deal
> of hand work.  Eventually, we wrote a script to drive SE sroute to do
> this, but it took awhile to get it written right, and it must be
> rewritten for each new chip.  Now I see CDN FE Ultra can do this
> automatically if the user selects to "exclude selected blocks" when
> routing the ring.  Or at least Cadence says it can.  Can it?
>
>     - Mark Wroblewski
>       ex-Cirrus and looking                      Lafayette, CO


From: Paul Rodman <france=reshape naught awn paris=rodman>

Hi John,

At my company we decided to develop our own tools for power routing.  The
reason: all the commercial EDA tools we used PISSED US OFF and had a lowest
common denominator approach that didn't let us have a repeatable, automatic
or even semi-automatic solution.

There had to be a better way... but what?


> Multiple layers on rings: to reduce the area required for supply rings,
> we used multiple layers.  We also intermingled the nodes in these ring
> stacks, so for example the outer of two rings would be stacked as VDD,
> GND, VDD on 3 of 5 routing layers, and the inner of the two rings would
> be stacked as GND, VDD, GND on some other 3 of 5 routing layers.  Strips
> across the middle on two layers vertically and one layer horizontally
> would tie everything together and deliver the supplies to the row metal.
> We were able to work this by hand, but the old tools (CDN DP, SE Sroute
> in "automatic" usage scenarios) couldn't cope.  Does FE Ultra do any of
> this effectively?


Our teams had the usual pitched battles about how to do the rings and
notching.  It's a suprisingly complex problem, after all.  Finally,
a compromise solution emerged: have a set of nested rings using only the top
two layers, i.e. NO layer stacking.

The reason we avoid stacking is that in six or more metal layer designs
(6LM+) we can easily route stdcells under the rings.  There is not much
congestion on the edges of blocks placed at the edge of the chip...  m4 and
lower is fine for routing and the rings are free and consume no cell area.

If we have a lot of current flowing from the padcells into the ring, we
might need a bigger ring (now slotted or replicated) but this is OK, since
it costs no cell area.

HOWEVER, lately, after staring at various IR maps, we realized that we lost
more IR drop that we'd like in the stubs that go from the power/gnd pads
into the rings.  We don't see why we should have to lose anything, if there
is still more area for more metal around (which there is).  Obviously this
is a very high current connection, it's at the top of the food chain after
all.

So, now I ask:

WHAT ARE THE RINGS FOR?

Are they some kind of Anti Satanic Sealer Ring of Safety?  Are they the
salt-thrown-over-the-shoulder of ASIC power distribution?

ASSERTION #1: The ring metal isn't doing anything except acting as a simple
way to connect the outputs of power pads into the mesh.  The actual
transverse-direction current carrying aspects of the ring are minimal.

ASSERTION #2: Rings are a CRUTCH for LAZY, OLD, EDA software.  A paradigm
for the days before pervasive full-chip fine grain meshs and good IR
tools existed.

In our 6LM+ methodology we have a full, "fine grain" mesh covering the core
of the chip.  I'll define fine grain at any grid with one +- pair less than
every 30 um or so in a 130 nm process.


What we think is a better goal is to use any and all available area and
layers to make something I tentatively call a "dagger" or "pitchfork" of
metal.  It would expand out as wide as possible, (as soon as possible),
after exiting the padcell's power/gnd pin.  Then this metal would merge
it into the core mesh for enough distance that the mesh picks up the job
of transverse dispersal.  Lower IR drop is the result since more metal
applied where it is needed.

Of course, such layout is not easy to generate.  You need to worry about
all sorts of design rule issues and you don't want to block routabilty to
the next door I/O cell's signal pins, etc.  It may require undoing and
tweaking the mesh in that area as well.  We do want our stdcells under that
area too, so its getting complex.  Also remember the pad cell pitches and
the mesh strap pitches are not related, so some interesting "beat frequency"
relative placement cases come up.

I think if you look at IR maps of how meshes work you will see why I feel
strongly.  The segments of ring between padcells aren't really doing much
for you... the mesh itself is so strongly connected in the same direction
as the ring you put in.  (Assuming your mesh isn't totally undersized.)


> Ring macros and other special cases: SE Sroute does a decent job of
> connecting row metal to ring macros (e.g., RAMs, register files) in most
> cases but coughs sometimes where high congestion exists.  (For example,
> where a via was dropped to get from the macro's internal supply to the
> ring around the macro.)  Unfortunately, this happened often enough that
> we couldn't ignore it, so more hand fixing.  How is FE with this today, as
> I understand it uses a new version of SE's Sroute for most heavy lifting?


Mark is obviously dealing with special IP that requires an external ring.
My condolences.  At least the Artisan and Virage RAMs come with a set of
"ring-pins" that you can set the size of that work a lot better.

But, such pre-built macro rings are another example of burning layout
resources to make software easier.  In fact, for "advanced" users of the
same RAMs, vias are dropped directly into the RAM core metal shapes to power
them and the internal "ring pins" can be dispensed with.  That is, all those
nasty m4 obstructions in the abstract are really power and ground pins, too.
Why present them to the tools and get more connection "meat" into the macro?


Alas, here the problem isn't just software, it's also the problem of
providing iron-clad rules-for-use from the RAM vendor to the user.  With a
ring, they can present a simple spec for how to give the RAM proper care
and feeding.  However, if you just let a user punch vias down internally,
and you get rid of the ring, you need a spec for "how much is good enough",
or you need good EDA tools to confirm for you that you are OK.  And the
"must-join" issues could be hard to check.  Not an easy thing to do, and
some of users will surely surely get it wrong and gum things up...

Anyway, I'm not a fan of complexity for the hell of it, but I think it is
worth noting that there is some optimizations out there to be had.


After saying all the above, we actually leave the rings on the commercial
compiled RAMs and punch vias down from our meshs onto them.  It's fast and
reliable and gives a really good connection so the compiled RAM macro rings
can be pretty damn small, and so the savings of ringless is small here.  We
can handle both rotated and normal orientations, too.  We do this connection
before stdcell placement, by the way, so that the virtual router understands
the implications of any track-blocking vias that get dropped...

To summarize, at my company we think:

      1) Almost no cell area should rarely be lost to any power
         routing. (analog IP power the major exception)

      2) we should be able to get a few mv of drop out of the "dagger" idea,
         Every millivolt we can get by being smart for "free" is worth it.

Meshes are in COMMON USE NOW.  So it's time to rethink the point-tools we
use to connect them up!


> Power supply design and analysis: Our old way of design and analysis for
> the power supply metal was an MS Excel spreadsheet.  What I really was
> looking for was a tool that studied the placed netlist and helped me beef
> up or trim down the power supply grid.  FE claims to do this.  What's the
> truth?  And what kind of clock trees does it assume?  Zero skew?  Useful
> skew?  Or does it use a netlist with clock trees inserted?


I haven't had a chance to try First Encounter for this, (want to!) but I do
know a bit about how Avanti's Mars Rail and now Astro Rail seem to work,
and I suspect they are all pretty similar.

They are doing time-averaged power only, so the skew issues aren't relevant.
They can provide pretty reasonable estimates and give you basic warm fuzzies
about your power metal.

You provide clock frequencies, and per-cycle estimated "switch factors" (aka
"switching probabilities") for the nets in your design.  You do this with
regular expressions on the netnames, etc.

If you have very VERY extensive simulation results you can use this, too.
(Most people don't..chips too damn big, vectors too lousy.)  Some folks
doing DSP kinds of things, might be able to use numerical analysis to get a
better idea on switch factors as another tack we've seen.

Given the lack of accuracy in the switch factors, you would need to be
careful in doing per-stdcell voltage value (aka "in context") timing based
on the IR map -- you need to model the errors in the IR map itself to
be safe.

What really helps you sleep at night is the EM output from these tools,
since it is getting easier and easier to create EM problems without any IR
drop problems on a chip.

Gross EM problems due to buggy layout, e.g. missing vias etc. are found with
the EM tools, but we've also found cases where simple padcell stubs were EM
violators, or layout inside the padcell itself had a problem.  (e.g. too
much power being drawn in one area due to all-layer-blockages that had
punched out the meshes too much, putting a large load on a single or a few
power padcells.) 

It's good to find these things *before* you freeze the padring and launch
the package and board design.  :)


And relevant to Mark's "skew assumed by power tools" question, there is an
announcement for a new tool called "CoolTime" (from Sequence), that *claims*
to do what I, personally, have been wishing for in my Xmas stocking: a tool
doing "switching windows" (a la PrimeTime-SI) but presenting the data per
unit area.  However, I am deeply concerned that CoolTime is going a bit off
the deep end in it's claims.  The problem with modeling actual dynamic power
switching is that the transition times are really small in the cases that
are nasty AND the set of all possible chips that are going to be
manufactured can have a zillion relative net delay differences due to
transistor and (independently) metal variation.  Therefore, the windows have
to be LARGE, or if not, it has to add the switching noise of one output into
a multiple, small windows.

I wonder if the whole thing becomes too worst case to be as accurate as they
claim?

I would like to see an ebeam trace showing the the power drop matching the
tools results for a complex chip running some repeating test pattern.  Dare
I hope for such a thing?  No?  Well, failing that I want to know exactly how
the results are calculated... might trust it then, but not before.

    - Paul Rodman, CTO
      ReShape, Inc.                              Mountain View, CA

"Relax. This is a discussion. Anything said here is just one engineer's opinion. Email in your dissenting letter and it'll be published, too."
This Web Site Is Modified Every 2-3 Days
Copyright 1991-2024 John Cooley. All Rights Reserved.
| Contact John Cooley | Webmaster | Legal | Feedback Form |


   !!!     "It's not a BUG,
  /o o\  /  it's a FEATURE!"
 (  >  )
  \ - / 
  _] [_     (jcooley 1991)