[FFmpeg-devel] [PATCH] Add x86-optimized versions of exponent_min().
Justin Ruggles
justin.ruggles
Thu Feb 3 02:08:44 CET 2011
---
On 01/31/2011 03:19 PM, Ronald S. Bultje wrote:
> On Mon, Jan 31, 2011 at 2:53 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
>> > On Mon, 31 Jan 2011, Justin Ruggles wrote:
>> >
>>> >> I get some very weird mmx2 results when I remove the first sub and
>>> >> change jae to ja.
>>> >>
>>> >> Athlon64 X2 6000+
>>> >> sse2: 3006 -> 2753
>>> >> mmx2: 5228 -> 5453
>>> >> mmx: 5490 -> 5430
>>> >>
>>> >> Atom 330
>>> >> sse2: 6834 -> 3779
>>> >> mmx2: 9951 -> 10525
>>> >> mmx: 11390 -> 11325
>>> >>
>>> >> Both CPUs are consistent in the change, except that on Athlon64 the mmx2
>>> >> version is slower than the mmx version. What do you suggest?
>> >
>> > I usually blame such weird results on code alignment, but I have no
>> > systematic way to fix them.
> Same here, try adding an ALIGN <num> (8 or 16) directly before a loop
> statement, or disassemble before/after and see where alignment could
> cause issues.
Thanks for the suggestion. Below is a chart of the results for
adding ALIGN 8 and ALIGN 16 before each of the 2 loops.
LOOP1/LOOP2 MMX MMX2 SSE2
-------------------------------
NONE/NONE : 5270 5283 2757
NONE/8 : 5200 5077 2644
NONE/16 : 5723 3961 2161
8/NONE : 5214 5339 2787
8/8 : 5198* 5083 2722
8/16 : 5936 3902 2128
16/NONE : 6613 4788 2580
16/8 : 5490 3702 2020
16/16 : 5474 3680* 2000*
The attached patch uses ALIGN 8 for both loops for MMX and ALIGN 16
for both loops for mmxext and sse2.
libavcodec/Makefile | 6 ++-
libavcodec/ac3dsp.c | 51 ++++++++++++++++++++++++++++++++
libavcodec/ac3dsp.h | 44 ++++++++++++++++++++++++++++
libavcodec/ac3enc.c | 35 ++++------------------
libavcodec/x86/Makefile | 4 ++
libavcodec/x86/ac3dsp.asm | 67 +++++++++++++++++++++++++++++++++++++++++++
libavcodec/x86/ac3dsp_mmx.c | 45 +++++++++++++++++++++++++++++
libavcodec/x86/x86util.asm | 10 ++++++
8 files changed, 232 insertions(+), 30 deletions(-)
create mode 100644 libavcodec/ac3dsp.c
create mode 100644 libavcodec/ac3dsp.h
create mode 100644 libavcodec/x86/ac3dsp.asm
create mode 100644 libavcodec/x86/ac3dsp_mmx.c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-x86-optimized-versions-of-exponent_min.patch
Type: text/x-patch
Size: 12978 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20110202/572e03bb/attachment.bin>
More information about the ffmpeg-devel
mailing list