[Ffmpeg-devel] Re: about mmx instructions
Pascal Massimino
skal
Sat Sep 3 09:09:43 CEST 2005
Howdy everybody!
(hey don't you guys never take holidays?)
(or: do you spare funny questions *for* holidays? ;)
On Thu, 2005-09-01 at 16:07, Michael Niedermayer wrote:
> anyone got a nicer derivation/proof?
Well, for a more generic approach, one should remember
that it's all about emulating the propagation of
carry/borrow.
Let's take for example the addition of two 8-bits values,
followed by a shift by 2: (x+y)>>2
The intermediate value (x+y) doesn't fit in 8bits (because
of the carry), but one can split the addition in two parts:
the 2 lowest bits are added, and the resulting carry is reported
to the result of adding the highest 6 bits. Namely:
(x+y)>>2 = [ (x>>2) + (y>>2) ] + [((x&3) + (y&3)) >> 2)]
The first term is the carry-less addition of the 6 hi bits, the
second terms adds the 2 lowest bits and only keeps the carry.
Now, note that you can also retrieve the carry from the
result itself, noting that:
(x+y)>>2 = (x>>2) + (y>>2) + { ((x+y)>>2) ^ (x>>2) ^ (y>>2) }
(x-y)>>2 = (x>>2) - (y>>2) - { ((x-y)>>2) ^ (x>>2) ^ (y>>2) }
These relations may sound silly, since we're using the result
(x+y)>>2 ... to find the result (x+y)>>2. But recall that the
problem is to make all the intermediate values fit into 8 bits.
Ok, same goes on for difference, and one can also replace the
arithmetic ops by logical ones, if one recalls that, in
boolean algebra, the '&' logical operator is equivalent to the
arithmetic multiplication, and the '^' operator equivalent to
the addition:
(x+y)>>1 = (x>>1) + (y>>1) + { x&y }
(x+y+1)>>1 = (x>>1) + (y>>1) + { x|y }
(x-y)>>1 = (x>>1) - (y>>1) - { y&~x }
(x-y+1)>>1 = (x>>1) - (y>>1) + { x&~y }
where, for short, {a} means 'lsb of a', that is: {a} = a&1.
Note that: {a+b} = {a}^{b}, {a*b} = {a}^{b}, {a+a}={0}, etc..
Ok, now we can stir the bit soup, using : x=(p0+(q1>>2)+1)
and y=(q0+(p1>>2)+1.
First we have:
(p1-q1)>>2 = (p1>>2) - (q1>>2) - Epsilon
Where Epsilon={0,1} is { ((p1-q1)>>2) ^ (p1>>2) ^ (q1>>2) }
So that:
(q0-p0+(p1-q1)>>2 + 1) = [q0+(p1>>2)+1] - [p0+(q1>>2)+1] + 1-Epsilon
Now, you can use the generic relation:
(x-y + e)>>1 = [ (x>>1) + { x&~y&e } ] - [ (y>>1) + { y&~x&~e } ]
for e = 0 or 1.
Replacing e by 1-Epsilon = 1^ { ((p1-q1)>>2) ^ (p1>>2) ^ (q1>>2) }
will lead to the result after some boolean stirring, if you recall,
for instance, that: { p0+(q1>>2) } = {p0}^{(q1>>2)} as far as the
LSB is concerned.
> or even a faster implementation?
Hint: (d&a) and (d&~a) can be simplified to not use 'a'
twice. Left as exercise to the reader...
Hope it helps,
bye!
Skal
More information about the ffmpeg-devel
mailing list