[FFmpeg-devel] [PATCH] VC-1 MMX DSP functions
Michael Niedermayer
michaelni
Thu Nov 15 03:43:20 CET 2007
On Tue, Nov 13, 2007 at 11:17:41PM +0100, Christophe GISQUET wrote:
> Michael Niedermayer a ?crit :
> > the code which is overall (whole decoder) fastest
> > and for cases where 2 are indistingishable the simpler one
>
> Sorry for the delay in replying but it was somewhat worth it: testing on
> a P4 showed that at least one optimization was in fact degrading
> performance (special case in vc1_put_shift2_mmx when stride == offset).
>
> Therefore, final (as far as I see) patch attached.
>
> Summary:
> MMX version for VC-1 subpel motion compensation functions. 30% faster
> decoding.
>
[...]
> +/**
> + * Data is already unpacked, so some operations can directly be made from
> + * memory.
> + */
> +static void vc1_put_hor_16b_shift2_mmx(uint8_t *dst, long int stride,
> + const int16_t *src, int rnd)
> +{
> + int h = 8;
> + src -= 1;
> +
> + asm volatile(
> + LOAD_ROUNDER_MMX("%4")
> + "1: \n\t"
> + "movq 2*0+0(%1), %%mm1 \n\t"
> + "movq 2*0+8(%1), %%mm2 \n\t"
> + "movq 2*1+0(%1), %%mm3 \n\t"
> + "movq 2*1+8(%1), %%mm4 \n\t"
> + "paddsw 2*3+0(%1), %%mm1 \n\t"
> + "paddsw 2*3+8(%1), %%mm2 \n\t"
> + "paddsw 2*2+0(%1), %%mm3 \n\t"
> + "paddsw 2*2+8(%1), %%mm4 \n\t"
> + "psubsw %%mm3, %%mm1 \n\t"
> + "psubsw %%mm4, %%mm2 \n\t"
> + /* Multiplying by 9 here overflows */
> + "psllw $3, %%mm3 \n\t"
> + "psllw $3, %%mm4 \n\t"
> + "psubsw %%mm1, %%mm3 \n\t"
> + "psubsw %%mm2, %%mm4 \n\t"
what overflows here?
also please replace all p*sw by p*w if saturation happens then your code
is buggy
[...]
> + return;
> + }
> + else { /* No horizontal filter, output 8 lines to dst */
> + vc1_put_shift_8bits[vmode](dst, src, stride, 1-rnd, stride);
> + return;
> + }
the return can be factored out of teh if/else
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I count him braver who overcomes his desires than him who conquers his
enemies for the hardest victory is over self. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071115/ce9a46d2/attachment.pgp>
More information about the ffmpeg-devel
mailing list