[Ffmpeg-devel] [PATCH] MMX optimixation for get_amv() in libavcodec/h263.c
Andrew Savchenko
Bircoph
Thu Apr 19 15:26:45 CEST 2007
Hello,
I optimized one FIXME in h263.c in get_amv().
Unfortunately, I failed to find or create video material where this
function is used during decoding, so syntetic tests was used. If
someone can provide me a link to such video or point me a way to
create such video, it'll be great.
Changes that was made for syntetic test benchmarks are in
h263_syntetic.diff.
First patch (h263_mmx_16bit.diff) use 16 bits for sum "variables",
thus operations such as shifts and summation can be perfomed on 4
values by single instruction. But I'm afraid that in real decoding
sum value may be overflow. So I made the second patch
(h263_mmx_32bit.diff) to eliminate this problem. Obviously it is
slower, because MMX instructions can take only 2 32-bit values at
time.
Testing was done on AthlonXP. Internal loop in 1st patch is
totally unrolled, because this provide the best perfomance in
comparision to untouched and partially unrolled loop (probably due
to better pipeline utilization).
In the 2nd patch internal loop is unrolled only partially, futher
unrolling brings no additional perfomance within measurement
errors. Also %%eax was used for multiplication, because MMX can
multiply only 16-bit values and can't unpack *signed* value from
word to double word.
There is benchmark results summary, oprofile was used as profiler:
========= mean value =========== standard deviation ===========
C: 38591 322
mmx_16: 5790 38
mmx_32: 10836 66
So, if sum is known to fit in 16 bit (indeed it can be slightly
larger, up to 17 bits, but it is hard to set exact treshold), 1st
patch is highly preffered.
P.S. While not related to the pacth, I like to ask some
development-related questions.
Can someone point me to SSE instruction set guide from AMD? Is this
one ever exists? I'm not sure that intel's descriptions and
perfomance recomendations for SSE are appliable for AMD
processors. Now I have only guides for mmx, 3dnow!, mmext/3dnowext
instructions sets from AMD and optimization guide for Athlon (pub.
20726, 21928, 22466 and 22007 respectively).
Is there any convenient way to debug asm inlines using gdb or so
on? Is it possible to step asm instructions, examine registers and
so on?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/9ebf71b2/attachment.pgp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h263_syntetic.diff
Type: text/x-diff
Size: 433 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/9ebf71b2/attachment.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h263_mmx_32bit.diff
Type: text/x-diff
Size: 2960 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/9ebf71b2/attachment-0001.diff>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: h263_mmx_16bit.diff
Type: text/x-diff
Size: 3889 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070419/9ebf71b2/attachment-0002.diff>
More information about the ffmpeg-devel
mailing list