[MPlayer-dev-eng] [PATCH] (new version) AltiVec: dct64 for mp3lib, IMDCT for liba52, detection code
Daniel Egger
degger at fhm.edu
Sun Jan 19 22:05:03 CET 2003
On Sun, 2003-01-19 at 19:44, Romain Dolbeau wrote:
> The MC functions are the pair of "*pixels8_xy2_c", right ?
Almost all of the functions in dsputil.c.
[me looks...]
Jesus, they have changed almost the whole damn thing....
> I've just sent a patch to ffmpeg for the first one
> ("put_pixels8_xy2_c"). The main problem is the C version is totally
> unreadable, I had to guess what I was supposed to to.
Well, I figured out what they do by preprocessing the source and using
the expanded (and reformatted) source for ideas.
> Sure the alignment is wrong, but it's mostly for reading so it's less of
> a problem.
Huh? You either need to align it (costly, especially if you need to read
2x16 bytes which is often the case) or you'll end up with wrong pixels
on the screen.
> The output block is 8 bytes-aligned, so it's almost OK. OTOH
> I tried "put_pixels8_c" but it was slower than the C code.
Yes, this is the only which doesn't benefit from altivec AFAIR.
Although writing it as (assuming this is the correct function as it
had been renamend and I haven't been tracking the changes):
UINT32 *p = (UINT32 *) block;
const UINT32 *pix = (const UINT32 *) pixels;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
if (h == 8)
return;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
(UINT8 *) p += line_size;
(UINT8 *) pix += line_size;
p[0] = pix[0];
p[1] = pix[1];
improves the code a bit by taking advantage of the many registers.
> For the xy2, maybe I should check if ((address & 0X1F) < 8), in that
> case I have both 8 pixels block in a single vector and I can avoid the
> second load from "pixels". What do you think ? My code seems faster than
> the C code, but I'm not sure it's always true - for fully out-of-cache
> data, the 32 bytes loaded per line for "pixels" may be too costly (you
> only really need 9 of them).
Nice idea, unfortunately I have to go right now; I'll send my more
detailed reply to this later.
--
Servus,
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20030119/936d2dea/attachment.pgp>
More information about the MPlayer-dev-eng
mailing list