[MPlayer-dev-eng] [PATCH] (new version) AltiVec: dct64 for mp3lib, IMDCT for liba52, detection code

Sun Jan 19 19:44:10 CET 2003

> Oh yes, but it's still beaten a lot by the MC functions. Especially when
> having the altivec idct in place the granularity of the profile improves a
> lot and one can see the memory bottlenecks.
> 
> As I said; I had the MC working a year ago but it broke badly due to
> misalignment which was introduced later. I tried to compensate by two
> different methods: Generally aligning all read data - this really blew
> performance, and special casing - this really introduced some nasty bugs
> and bloated code...
> 
> I don't see any light here before we some reasonable alignment (at least
> 64bit) back or do some nasty surgery on the interfaces. Needless to say I
> really lack time, if you're interested in looking at some of the incorrect
> or old code I'd be glad to send it over....

The MC functions are the pair of "*pixels8_xy2_c", right ?

I've just sent a patch to ffmpeg for the first one
("put_pixels8_xy2_c"). The main problem is the C version is totally
unreadable, I had to guess what I was supposed to to.

Sure the alignment is wrong, but it's mostly for reading so it's less of
a problem. The output block is 8 bytes-aligned, so it's almost OK. OTOH
I tried "put_pixels8_c" but it was slower than the C code.

For the xy2, maybe I should check if ((address & 0X1F) < 8), in that
case I have both 8 pixels block in a single vector and I can avoid the
second load from "pixels". What do you think ? My code seems faster than
the C code, but I'm not sure it's always true - for fully out-of-cache
data, the 32 bytes loaded per line for "pixels" may be too costly (you
only really need 9 of them).

-- 
DOLBEAU Romain               |
ENS Cachan / Ker Lann        | l'histoire est entierement vraie, puisque
Thesard IRISA / CAPS         |     je l'ai imaginee d'un bout a l'autre
dolbeaur at club-internet.fr    |           -- Boris Vian