( ESNUG 343 Item 13 ) -------------------------------------------- [2/16/00]
Subject: 8 Engineers Discussing 7 Types Of Adder Hardware Implementations
> What are the specifications of a "Carry-Look-Ahead-Adder"? Is this the
> same as a "Carry-Save-Adder"?
>
> - Reto Amherd
> HSR Hochschule Rapperswil Wald, Switzerland
From: S.D. Lew <s_d_lew@my-deja.com>
They're not the same. It's perhaps easier to explain by looking at several
different types of basic adders...
Bit Serial Adder:
Adds 1 bit at a time. The carry out of the least significant bit
is computed and input to the next more significant bit up to the
most significant bit. Unfortunatly this means the sum for a bit
can't be computed before the lower bit's carry is computed.
Just like it's done by hand.
Carry Ripple Adder:
Same as a BSA except the intermediate results aren't saved. As
a result, the carry ripples up from the lsb to the msb, but still
the sum for a bit can't be computed before the lower bit's carry
is computed.
Carry Kill/Generate/Propagate Adder (Manchester Adder):
Similar to CRA. For each bit, the carry out is computed to be either
always 0 (kill), always 1 (generate), or whatever the carry in is
(propagate). This is statistically faster than the ripple adder
since kill/generate signals can start in the middle bits instead
of always at the lsb. Although the worst case is the same as the CRA,
when implemented in silicon it's sometimes faster since pass-gate
logic for the propagate can often be faster than going through
all the carry logic from the lower bits...
Carry Lookahead Adder:
Instead of waiting for the carry bit to propagate up, extra logic
(lookahead logic) is put in to compute each of the carry bits (in
parallel) by looking at all the lower bits. This is almost always
faster than a CRA or MA since the logic to compute a carry for a
specific bit is faster than computing the intermediate carry bits
and waiting for the result.
Carry Save Adder:
A carry save adder is a different thing all together. Instead of
trying to solve the addition problem, it solves a different problem.
All a CSA does is converts the problem of adding three numbers
together into a problem of adding two numbers together. If you
want to add 9 numbers together, you can use 3 CSAs to reduce it 6
numbers; and then reduce 6 numbers to 4 numbers. The advantage of
a CSA is that it's fast (no carries, since it SAVEs them for later).
Multipliers and DSP accumulators tend to do CSAs so they save all
the carries from all the adds to the last stage and do one CLA at
the end...
To see how it works, let's look at the truth table for adding 3 bits
together...
x +y +z =s +2c
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
(notice you can never get a 'double' carry)
You can treat all the bits put together {s[i]} as a number and {c[i]}
as another number so you've just converted the problem of computing
X+Y+Z to S+2C without waiting for any carries...
A CSA is a basic example of a computation technique called redundant digit
representation of which there have been countless papers...
The basic motivation for redundant digit representation is:
1. Computation is often easier in different representations of a
number (that are not compact).
2. Using binary representation for intermediate results requires extra
logic to make the representation compact.
Also, if we aren't going to look at the intermediate results anyways, why
bother to convert them to binary? Save the logic (which slows things down
and makes it bigger) and just convert it back once to binary at the end.
- S.D. Lew
---- ---- ---- ---- ---- ---- ----
From: Phil Carmody <carmody@cpd.ntc.nokia.com>
Thanks, this has been most informative, there were probably two I'd never
heard of before. Two follow-up points.
When an old old delerious work-mate was teaching me about CSA's several
years back, he aberrantly remembered the acronym standing for 'Carry Shift
Adder', which he explained made perfect sense as the carry bits are shifted
as they are reinserted into the sum later.
Incidentally, is it true of not that the prevalence of the MAC/MADD
(multiply-accumulate or multiply-and-add) instructions is because
multipliers have traditionally been implimented using CSA's, and that the
very first stage has a 'spare' input? So one might as well get an extra add
in there for free. (so not 32 but 33 sums ->22->16->12->8->6->...)
Has anyone ever considered higher order multiplexers, ones which take, say,
7 inputs and output 3? I (from ignorance) assume these would be as easy to
cascade as traditional CSA's. What would the tranny count be for such an
operation be, relative to a normal CSA? (33->15->7->3 so three stages gets
further than 6 of the above). If they are not used - why not, what's the
drawback? As ever, eager to learn...
- Phil Carmody
Nokia Telecommunications Cambridge, UK
---- ---- ---- ---- ---- ---- ----
From: S.D. Lew <s_d_lew@my-deja.com>
Just to be extra confusing, there's yet another type of adder called a carry
"select" adder (which is sort of a generalized manchester adder popular in
FPGA implementations). This makes a C{Save|Shift|Select}A ambigous...
> Incidentally, is it true of not that the prevalence of the MAC/MADD
> (multiply-accumulate or multiply-and-add) instructions is because
> multipliers have traditionally been implimented using CSA's, and that the
> very first stage has a 'spare' input? So one might as well get an extra
> add in there for free. (so not 32 but 33 sums ->22->16->12->8->6->...)
CSA (the save/shift kind) is probably a popular implementation, however,
since most DSP MAC unit have the accumulator at much higher precision (more
bits), it sometimes makes more sense to insert the accumulation near the
end of the tree rather than the beginning... (unless the multiplier is
internally rounded).
In some implementations the accumulator is never really there until you read
it. The last two carry save numbers are never added together before being
fed back into the adder tree so you don't need the CLA until you actually
read the accumulator to do something else (or at least until later in the
pipeline)... Sometimes it's really hard to know what tricks go on inside
those DSP blocks ;-)
> Has anyone ever considered higher order multiplexers, ones which take,
> say, 7 inputs and output 3? I (from ignorance) assume these would be as
> easy to cascade as traditional CSA's. What would the tranny count be for
> such an operation be, relative to a normal CSA? (33->15->7->3 so three
> stages gets further than 6 of the above). If they are not used - why
> not, what's the drawback?
The biggest draw back of CSAs is routing (you have to get 3 numbers in on
the right and only 2 come out of the left and you have to bring another one
in from another branch to do the next stage). 7 to 3 would just make this
routing situation worse. However having said that, I know of a hand
designed carry save accumulator that added more than 3 bits together in a
stage that was more compact than the standard CSA implementation, but the
cells were custom cells not standard ASIC/FPGA library cells...
- S.D. Lew
---- ---- ---- ---- ---- ---- ----
> Carry Save Adder:
> A carry save adder is a different thing all together. Instead of
> trying to solve the addition problem, it solves a different problem.
> All a CSA does is converts the problem of adding three numbers
> together into a problem of adding two numbers together. If you
> want to add 9 numbers together, you can use 3 CSAs to reduce it 6
> numbers; and then reduce 6 numbers to 4 numbers. The advantage of
> a CSA is that it's fast (no carries, since it SAVEs them for later).
> Multipliers and DSP accumulators tend to do CSAs so they save all
> the carries from all the adds to the last stage and do one CLA at
> the end...
From: Terje Mathisen <Terje.Mathisen@hda.hydro.com>
This is exactly how you'll have to implement fast bignum (arbitrary
precision) code on IA64 (Itanium?), by using a pair of register arrays
to hold intermediate data in carry save format.
By doing this you can delay the final carry ripple operation until the
end, which results in greatly improved performance. (I'd guess something
like 3-5 times faster for a bignum multiply operation, which is still
small enough that it doesn't make sense to use FFT or other tricks.)
- Terje Mathisen
---- ---- ---- ---- ---- ---- ----
From: Ray Andraka <randraka@ids.net>
There is a little bit of discussion about this on the multiplier page in my
website. In FPGAs, the dedicated ripple carry is so much faster than the
general routing resources that a ripple carry adder will generally
outperform carry save and carry look-ahead architectures.
- Ray Andraka
Andraka Consulting http://users.ids.net/~randraka
---- ---- ---- ---- ---- ---- ----
> What are the specifications of a "Carry-Look-Ahead-Adder"? Is this the
> same as a "Carry-Save-Adder"?
From: gah@ugcs.caltech.edu (Glen Herrmannsfeldt)
They are not very similar. The simplest adder adds the way you do on paper,
one digit at a time and then onto the next digit. This is slow.
Carry lookahead decides in advance whether there will be a carry, without
waiting. It takes more logic, but not that much more, and is a lot faster.
Carry Save Adder is used when adding more than two numbers. You add the
numbers bit by bit and generte a sum (that doesn't include carry) and the
carries to be added in later. If you are adding many numbers at once, you
can wait to combine the carries until near the end.
- Glen Herrmannsfeldt
California Institute of Technology Pasadena, CA
---- ---- ---- ---- ---- ---- ----
From: George Russell <ger@informatik.uni-bremen.de>
One really can't leave the subject of carry technology without mentioning
the "anticipating carriage" mechanism Charles Babbage planned for the
Analytical Engine. Unfortunately I can't find a picture or details on the
Internet (I've come across it several times in books) but you could try
poking around http://www.fourmilab.ch/babbage/contents.html I suppose the
anticipating carriage mechanism is closest to a "Carry-Look-Ahead-Adder".
- George Russell
Universitaet Bremen Bremen, Germany
|
|