[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Tue Oct 13 09:25:03 CEST 2015

On 10/13/15, Clement Boesch <u at pkh.me> wrote:
> On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
>> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>> wrote:
>> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos <cehoyos at ag.or.at>
>> > wrote:
>> >> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>> >>
>> >>> Bench from libavfilter/astats on a 15 min clip.
>> >>
>> >> I believe that your test would indicate that the
>> >> old variant is faster or that no result can be
>> >> given which is what my tests show.
>>
>> Also, how you can possibly believe that the old variant is faster is
>> beyond me given the astonishing amount of work by Intel, Red Hat, and
>> others to create the absolutely best performing libc.
>>
>> Just have a look at
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281,
>> it gives an idea of the extreme lengths they go to.
>>
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD
>
> [/tmp]* cat a.c
> #include <math.h>
> #include <stdlib.h>
>
> #define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
>
> double f1d(double x) { return fabs(x); }
> double f2d(double x) { return FFABS(x); }
>
> int f1i(int x) { return abs(x); }
> int f2i(int x) { return FFABS(x); }
> [/tmp]* gcc -O2 -c a.c && objdump -d -Mintel a.o
>
> a.o:     file format elf64-x86-64
>
>
> Disassembly of section .text:
>
> 0000000000000000 <f1d>:
>    0:   f2 0f 10 0d 00 00 00    movsd  xmm1,QWORD PTR [rip+0x0]        # 8
> <f1d+0x8>
>    7:   00
>    8:   66 0f 54 c1             andpd  xmm0,xmm1
>    c:   c3                      ret
>    d:   0f 1f 00                nop    DWORD PTR [rax]
>
> 0000000000000010 <f2d>:
>   10:   66 0f 2e 05 00 00 00    ucomisd xmm0,QWORD PTR [rip+0x0]        # 18
> <f2d+0x8>
>   17:   00
>   18:   72 06                   jb     20 <f2d+0x10>
>   1a:   f3 c3                   repz ret
>   1c:   0f 1f 40 00             nop    DWORD PTR [rax+0x0]
>   20:   f2 0f 10 0d 00 00 00    movsd  xmm1,QWORD PTR [rip+0x0]        # 28
> <f2d+0x18>
>   27:   00
>   28:   66 0f 57 c1             xorpd  xmm0,xmm1
>   2c:   c3                      ret
>   2d:   0f 1f 00                nop    DWORD PTR [rax]
>
> 0000000000000030 <f1i>:
>   30:   89 fa                   mov    edx,edi
>   32:   89 f8                   mov    eax,edi
>   34:   c1 fa 1f                sar    edx,0x1f
>   37:   31 d0                   xor    eax,edx
>   39:   29 d0                   sub    eax,edx
>   3b:   c3                      ret
>   3c:   0f 1f 40 00             nop    DWORD PTR [rax+0x0]
>
> 0000000000000040 <f2i>:
>   40:   89 fa                   mov    edx,edi
>   42:   89 f8                   mov    eax,edi
>   44:   c1 fa 1f                sar    edx,0x1f
>   47:   31 d0                   xor    eax,edx
>   49:   29 d0                   sub    eax,edx
>   4b:   c3                      ret
> [/tmp]*
>
> So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
> essentially identical to FFABS().
>
> I have similar results with clang (3.7.0).
>
> Conclusion: using fabs() looks better with at least recent versions of
> clang
> and GCC on x86-64 (but may introduce slight behaviour changes?)
>
> To be more rigorous, it would be interesting to compare on different arch &
> compilers, but changing FFABS() with fabs() sounds OK to me.

What about fmax/FFMAX?
> --
> Clement B.
>