[Ffmpeg-devel] [PATCH] MMX optimixation for get_amv() in libavcodec/h263.c
Michael Niedermayer
michaelni
Thu Apr 19 22:09:45 CEST 2007
Hi
On Thu, Apr 19, 2007 at 05:26:45PM +0400, Andrew Savchenko wrote:
> Hello,
>
> I optimized one FIXME in h263.c in get_amv().
> Unfortunately, I failed to find or create video material where this
> function is used during decoding, so syntetic tests was used. If
xvid and the reference mpeg4 encoder should support global motion estimation
(its a useless feature quality wise but people occasionally use it none
the less)
> someone can provide me a link to such video or point me a way to
> create such video, it'll be great.
>
> Changes that was made for syntetic test benchmarks are in
> h263_syntetic.diff.
>
> First patch (h263_mmx_16bit.diff) use 16 bits for sum "variables",
> thus operations such as shifts and summation can be perfomed on 4
> values by single instruction. But I'm afraid that in real decoding
> sum value may be overflow. So I made the second patch
> (h263_mmx_32bit.diff) to eliminate this problem. Obviously it is
> slower, because MMX instructions can take only 2 32-bit values at
> time.
>
> Testing was done on AthlonXP. Internal loop in 1st patch is
> totally unrolled, because this provide the best perfomance in
> comparision to untouched and partially unrolled loop (probably due
> to better pipeline utilization).
>
> In the 2nd patch internal loop is unrolled only partially, futher
> unrolling brings no additional perfomance within measurement
> errors. Also %%eax was used for multiplication, because MMX can
> multiply only 16-bit values and can't unpack *signed* value from
> word to double word.
>
> There is benchmark results summary, oprofile was used as profiler:
> ========= mean value =========== standard deviation ===========
> C: 38591 322
> mmx_16: 5790 38
> mmx_32: 10836 66
>
> So, if sum is known to fit in 16 bit (indeed it can be slightly
> larger, up to 17 bits, but it is hard to set exact treshold), 1st
> patch is highly preffered.
the 16 bit code with a simple check if the values would fit and
a fallback to the c version could be done
>
> P.S. While not related to the pacth, I like to ask some
> development-related questions.
>
> Can someone point me to SSE instruction set guide from AMD? Is this
> one ever exists? I'm not sure that intel's descriptions and
> perfomance recomendations for SSE are appliable for AMD
> processors. Now I have only guides for mmx, 3dnow!, mmext/3dnowext
> instructions sets from AMD and optimization guide for Athlon (pub.
> 20726, 21928, 22466 and 22007 respectively).
try http://www.agner.org/optimize/
and see doc/optimization.txt in ffmpeg svn
>
> Is there any convenient way to debug asm inlines using gdb or so
> on? Is it possible to step asm instructions, examine registers and
> so on?
yes gdb can do this IIRC
> --- mplayer/libavcodec/h263.c.orig 2007-04-10 11:06:58.000000000 +0400
> +++ mplayer/libavcodec/h263.c 2007-04-18 23:11:41.000000000 +0400
> @@ -4231,6 +4231,10 @@
> static int8_t quant_tab[4] = { -1, -2, 1, 2 };
> const int xy= s->mb_x + s->mb_y * s->mb_stride;
>
> + int volatile MX, MY;
> + MX = get_amv(s, 0);
> + MY = get_amv(s, 1);
why volatile?
[...]
> static inline int get_amv(MpegEncContext *s, int n){
> +#ifndef HAVE_MMX
> int x, y, mb_v, sum, dx, dy, shift;
> +#else /* HAVE_MMX */
> + int mb_v, sum, dx, dy, shift;
> +#endif /* HAVE_MMX */
MMX specific code should be in libavcodec/i386/...
[...]
> + asm volatile(
> + "pxor %%mm5, %%mm5 \n" //sum=0
> + "movd %[st], %%mm3 \n" //shift
^^^^
not gcc 2.95 compatible
[...]
> + "packssdw %%mm1, %%mm1 \n" //0 0 0 dx
not needed?
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/01d86b62/attachment.pgp>
More information about the ffmpeg-devel
mailing list