[FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

Ganesh Ajjanagadde gajjanag at mit.edu
Mon Oct 12 22:57:27 CEST 2015


On Mon, Oct 12, 2015 at 7:59 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
> On Mon, Oct 12, 2015 at 7:46 AM, Carl Eugen Hoyos <cehoyos at ag.or.at> wrote:
>> Ganesh Ajjanagadde <gajjanag <at> mit.edu> writes:
>>
>>> It is well known that fabs and fabsf are at least as fast and usually
>>> faster than the FFABS macro, at least on the gcc+glibc combination.
>>
>> I wasn't aware of this.
>> And I believe we support other compilers and other
>> libc implementations.
>
> Indeed, which is why performance comparisons are welcome. I argue
> below why any sane configuration should not regress performance wise.
> This is also "relevant information" in my view.
>
>>
>>> For instance, see the reference:
>>> http://patchwork.sourceware.org/patch/6735/.
>>> This was a patch to glibc in order to remove their usages. Given their
>>> general performance obsession (more than FFmpeg in many cases), they
>>> have ensured that fabs and fabsf never peform worse than FFABS.
>>
>> Ok but is this really related?
>
> The reference is, the comment may not be, I was slightly annoyed at
> FFABS usage when libc provides them on all our platforms, and wanted a
> justification that would appeal to the FFmpeg crowd, namely peformance
> to move away from them.
>
>>
>>> I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE
>>> mode enabled, and just the standard -O3 optimizations, there is a
>>> performance benefit.
>>
>> This is the only relevant information imo.
>> Please provide (very, very short) information
>> on what you tested.
>
> Random integers, same style as before. I have not posted numbers,
> since my numbers are anyway meaningless: I lack non
> x86-64+(gcc/clang)+glibc configurations.
> As for that being the only relevant message, I do intend to shorten
> the message. The long stuff was simply my own personal motivation to
> make people understand why I did this stuff. Otherwise, I would have
> sent a separate message anyway in the patch thread, let me know what
> style you prefer.
>
>>
>> Since you mention libc so often: Does the patch
>> work on win*, aix and other strange platforms?
>
> Why not, any standard, conformant fabs/fabsf should. Again, I lack the
> configurations and am just a university student with a single laptop.
> fabs and fabsf are already being used elsewhere. Inf anything, they
> are far better specified on IEEE 754 than FFABS - behavior with NaN,
> Inf, etc.

Bench from libavfilter/astats on a 15 min clip. Of course the
difference is slight, but nonetheless it exists. The best case is the
same, but look at the difference in the worst cases (as was mentioned
in the glibc link I gave, I suspect some trickery for subnormal
floats/Inf/0.0). By the way, I can show results skewing even more
heavily in favor of fabs by using "random" floating point numbers,
random in the sense of being a random 64 bit pattern (same style as my
old crude bench - fill a large array, and test). There, believe it or
not, I was getting a nearly 1.5-2x improvement.

Anyway, here it is:
old:
   4230 decicycles in abs,       1 runs,      0 skips
   2520 decicycles in abs,       2 runs,      0 skips
   1635 decicycles in abs,       4 runs,      0 skips
    967 decicycles in abs,       8 runs,      0 skips
    635 decicycles in abs,      16 runs,      0 skips
    473 decicycles in abs,      32 runs,      0 skips
    389 decicycles in abs,      64 runs,      0 skips
    350 decicycles in abs,     128 runs,      0 skips
    331 decicycles in abs,     256 runs,      0 skips
    321 decicycles in abs,     512 runs,      0 skips
    319 decicycles in abs,    1024 runs,      0 skips
    318 decicycles in abs,    2048 runs,      0 skips
    315 decicycles in abs,    4096 runs,      0 skips
    317 decicycles in abs,    8192 runs,      0 skips
    335 decicycles in abs,   16384 runs,      0 skips
    335 decicycles in abs,   32768 runs,      0 skips
    333 decicycles in abs,   65536 runs,      0 skips
    342 decicycles in abs,  131072 runs,      0 skips
    340 decicycles in abs,  262144 runs,      0 skips
    345 decicycles in abs,  524285 runs,      3 skips
    348 decicycles in abs, 1048565 runs,     11 skips
    351 decicycles in abs, 2097129 runs,     23 skipsbitrate=N/A
    352 decicycles in abs, 4194252 runs,     52 skipsbitrate=N/A
    350 decicycles in abs, 8388498 runs,    110 skipsbitrate=N/A
    351 decicycles in abs,16776993 runs,    223 skipsbitrate=N/A
    352 decicycles in abs,33553999 runs,    433 skipsbitrate=N/A
    351 decicycles in abs,67108036 runs,    828 skips
new:
   3540 decicycles in abs,       1 runs,      0 skips
   2160 decicycles in abs,       2 runs,      0 skips
   1447 decicycles in abs,       4 runs,      0 skips
    881 decicycles in abs,       8 runs,      0 skips
    594 decicycles in abs,      16 runs,      0 skips
    455 decicycles in abs,      32 runs,      0 skips
    382 decicycles in abs,      64 runs,      0 skips
    361 decicycles in abs,     128 runs,      0 skips
    356 decicycles in abs,     256 runs,      0 skips
    334 decicycles in abs,     512 runs,      0 skips
    322 decicycles in abs,    1024 runs,      0 skips
    317 decicycles in abs,    2048 runs,      0 skips
    315 decicycles in abs,    4096 runs,      0 skips
    341 decicycles in abs,    8192 runs,      0 skips
    363 decicycles in abs,   16383 runs,      1 skips
    342 decicycles in abs,   32767 runs,      1 skips
    354 decicycles in abs,   65532 runs,      4 skips
    348 decicycles in abs,  131068 runs,      4 skips
    354 decicycles in abs,  262138 runs,      6 skips
    356 decicycles in abs,  524277 runs,     11 skips
    356 decicycles in abs, 1048560 runs,     16 skips
    354 decicycles in abs, 2097120 runs,     32 skipsbitrate=N/A
    354 decicycles in abs, 4194251 runs,     53 skipsbitrate=N/A
    353 decicycles in abs, 8388504 runs,    104 skipsbitrate=N/A
    353 decicycles in abs,16777006 runs,    210 skipsbitrate=N/A
    353 decicycles in abs,33553993 runs,    439 skipsbitrate=N/A
    352 decicycles in abs,67107951 runs,    913 skips

>
>>
>> Carl Eugen
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


More information about the ffmpeg-devel mailing list