[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2
Michael Niedermayer
michaelni
Wed Aug 22 03:47:52 CEST 2007
Hi
On Tue, Aug 21, 2007 at 11:37:38PM +0200, Balatoni Denes wrote:
> Hi!
>
> Ah, one DECLARE_ALIGNED_8 was missed. Updated patch attached.
>
> bye
> Denes
>
> Tuesday 21 August 2007 22:35-kor Balatoni Denes ezt ?rta:
> > Hi!
> >
> > Here is a patch to add sparc vis optimized simple_idct. Speedwise it is
> > about halfway between the C and the mlib version, slightly on the faster
> > side. Although the mlib version is faster, this is more accurate - and I
> > honestly don't know why it is not faster on the sparc than it is, it should
> > be according to my estimates, but it is not.
it would be very interresting to find out why its not faster ...
> > Also, although it's the same algorithm as simple_idct, it might be less
> > accurate on rare occasions, because of the primitive slow split 8+8 bit
> > multply operation in VIS. I am hoping the idct won't overflow, too.
what happens with the regression tests if its forced to be used where the
normal C simple idct normally is?
and what does dct-test.c output for the idct?
[...]
> if (accel & ACCEL_SPARC_VIS) {
> + if(avctx->idct_algo==FF_IDCT_AUTO || avctx->idct_algo==FF_IDCT_SIMPLEVIS){
> + c->idct_put = ff_simple_idct_put_vis;
> + c->idct_add = ff_simple_idct_add_vis;
> + c->idct = ff_simple_idct_vis;
> + c->idct_permutation_type = FF_NO_IDCT_PERM;
[...]
> +static DECLARE_ALIGNED_8(int16_t, coeffs[28]) = {
> + 32138, 32138, 32138, 32138,
> + 30274, 30274, 30274, 30274,
> + 27246, 27246, 27246, 27246,
> + 23170, 23170, 23170, 23170,
> + 18205, 18205, 18205, 18205,
> + 12540, 12540, 12540, 12540,
> + 6393, 6393, 6393, 6393
> +};
const static
[...]
> +#define IDCT4ROWS(in, shift, label, s1, s2, bi, ma) \
> + /* order input */\
> + "ld [" in "], %%f0 \n\t"\
> + "ld [" in "+4], %%f4 \n\t"\
> + "ld [" in "+8], %%f8 \n\t"\
> + "ld [" in "+12], %%f12 \n\t"\
> + "ld [" in "+16], %%f1 \n\t"\
> + "ld [" in "+4+16], %%f5 \n\t"\
> + "ld [" in "+8+16], %%f9 \n\t"\
> + "ld [" in "+12+16], %%f13 \n\t"\
> + "ld [" in "+32], %%f2 \n\t"\
> + "ld [" in "+4+32], %%f6 \n\t"\
> + "ld [" in "+8+32], %%f10 \n\t"\
> + "ld [" in "+12+32], %%f14 \n\t"\
> + "ld [" in "+48], %%f3 \n\t"\
> + "ld [" in "+4+48], %%f7 \n\t"\
> + "ld [" in "+8+48], %%f11 \n\t"\
> + "ld [" in "+12+48], %%f15 \n\t"\
> + "ldd [%0], %%f60 \n\t"\
> + "ldd [%0" ma "], %%f62 \n\t"\
> + "fzero %%f30 \n\t"\
> + "wr %%g0," s1 ", %%gsr \n\t"\
> + label "1: \n\t"\
> + "fand %%f0,%%f60, %%f32 \n\t"\
> + "fand %%f2,%%f60, %%f34 \n\t"\
> + "fand %%f4,%%f60, %%f36 \n\t"\
> + "fand %%f6,%%f60, %%f38 \n\t"\
> + "fand %%f8,%%f60, %%f40 \n\t"\
> + "fand %%f10,%%f60, %%f42 \n\t"\
> + "fand %%f12,%%f60, %%f44 \n\t"\
> + "fand %%f14,%%f60, %%f46 \n\t"\
> + "fand %%f0,%%f62, %%f48 \n\t"\
> + "fand %%f2,%%f62, %%f50 \n\t"\
> + "fand %%f4,%%f62, %%f52 \n\t"\
> + "fand %%f6,%%f62, %%f54 \n\t"\
> + "fand %%f8,%%f62, %%f56 \n\t"\
> + "fand %%f10,%%f62, %%f58 \n\t"\
> + "fand %%f12,%%f62, %%f60 \n\t"\
> + "fand %%f14,%%f62, %%f62 \n\t"\
> + "fpackfix %%f32, %%f0 \n\t"\
> + "fpackfix %%f34, %%f1 \n\t"\
> + "fpackfix %%f36, %%f4 \n\t"\
> + "fpackfix %%f38, %%f5 \n\t"\
> + "fpackfix %%f40, %%f8 \n\t"\
> + "fpackfix %%f42, %%f9 \n\t"\
> + "fpackfix %%f44, %%f12 \n\t"\
> + "fpackfix %%f46, %%f13 \n\t"\
well i dont know sparc asm at all but dont you read a few things in at the top
and then just overwrite these registers
also you permute the input explicitly instead of setting idct_permutation_type
properly
please dont sumbit trash
[...]
> +void ff_simple_idct_put_vis(uint8_t *dest, int line_size, DCTELEM *data) {
> + ff_simple_idct_vis(data);
> + ff_put_pixels_clamped_vis(data, dest, line_size);
> +}
> +
> +void ff_simple_idct_add_vis(uint8_t *dest, int line_size, DCTELEM *data) {
> + ff_simple_idct_vis(data);
> + ff_add_pixels_clamped_vis(data, dest, line_size);
> +}
check that gcc inlines these 4 calls, if not do something so it does, they
should be inlined
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070822/4102677a/attachment.pgp>
More information about the ffmpeg-devel
mailing list