[FFmpeg-devel] [PATCH] avcodec/jpeg2000: replace naive pow call with smarter exp2fi
Ganesh Ajjanagadde
gajjanag at mit.edu
Wed Dec 9 04:09:35 CET 2015
On Tue, Dec 8, 2015 at 1:49 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Dec 07, 2015 at 09:50:49PM -0500, Ganesh Ajjanagadde wrote:
>> pow is a very wasteful function for this purpose. A low hanging fruit
>> would be simply to replace with exp2f, and that does yield some speedup.
>> However, there are 2 drawbacks of this:
>> 1. It does not exploit the integer nature of the argument.
>> 2. (minor) Some platforms lack a proper exp2f routine, making benefits available
>> only to non broken libm.
>> 3. exp2f does not solve the same issue that plagues pow, namely terrible
>> worst case performance. This is a fundamental issue known as the
>> "table-maker's dilemma" recognized by Prof. Kahan himself and
>> subsequently elaborated and researched by many others. All this is clear from benchmarks below.
>>
>> This exploits the IEEE-754 format to get very good performance even in
>> the worst case for integer powers of 2. This solves all the issues noted
>> above. Function tested with clang usan over [-1000, 1000] (beyond range of
>> relevance for this, which is [-255, 255]), patch itself with FATE.
>>
>> Benchmarks obtained on x86-64, Haswell, GNU-Linux via 10^5 iterations of
>> the pow call, START/STOP, and command ffplay ~/samples/jpeg2000/chiens_dcinema2K.mxf.
>> Low number of runs also given to prove the point about worst case:
>>
>> pow:
>> 216270 decicycles in pow, 1 runs, 0 skips
>> 110175 decicycles in pow, 2 runs, 0 skips
>> 56085 decicycles in pow, 4 runs, 0 skips
>> 29013 decicycles in pow, 8 runs, 0 skips
>> 15472 decicycles in pow, 16 runs, 0 skips
>> 8689 decicycles in pow, 32 runs, 0 skips
>> 5295 decicycles in pow, 64 runs, 0 skips
>> 3599 decicycles in pow, 128 runs, 0 skips
>> 2748 decicycles in pow, 256 runs, 0 skips
>> 2304 decicycles in pow, 511 runs, 1 skips
>> 2072 decicycles in pow, 1022 runs, 2 skips
>> 1963 decicycles in pow, 2044 runs, 4 skips
>> 1894 decicycles in pow, 4091 runs, 5 skips
>> 1860 decicycles in pow, 8184 runs, 8 skips
>>
>> exp2f:
>> 134140 decicycles in pow, 1 runs, 0 skips
>> 68110 decicycles in pow, 2 runs, 0 skips
>> 34530 decicycles in pow, 4 runs, 0 skips
>> 17677 decicycles in pow, 8 runs, 0 skips
>> 9175 decicycles in pow, 16 runs, 0 skips
>> 4931 decicycles in pow, 32 runs, 0 skips
>> 2808 decicycles in pow, 64 runs, 0 skips
>> 1747 decicycles in pow, 128 runs, 0 skips
>> 1208 decicycles in pow, 256 runs, 0 skips
>> 952 decicycles in pow, 512 runs, 0 skips
>> 822 decicycles in pow, 1024 runs, 0 skips
>> 765 decicycles in pow, 2047 runs, 1 skips
>> 722 decicycles in pow, 4094 runs, 2 skips
>> 693 decicycles in pow, 8190 runs, 2 skips
>>
>> exp2fi:
>> 2740 decicycles in pow, 1 runs, 0 skips
>> 1530 decicycles in pow, 2 runs, 0 skips
>> 955 decicycles in pow, 4 runs, 0 skips
>> 622 decicycles in pow, 8 runs, 0 skips
>> 477 decicycles in pow, 16 runs, 0 skips
>> 368 decicycles in pow, 32 runs, 0 skips
>> 317 decicycles in pow, 64 runs, 0 skips
>> 291 decicycles in pow, 128 runs, 0 skips
>> 277 decicycles in pow, 256 runs, 0 skips
>> 268 decicycles in pow, 512 runs, 0 skips
>> 265 decicycles in pow, 1024 runs, 0 skips
>> 263 decicycles in pow, 2048 runs, 0 skips
>> 263 decicycles in pow, 4095 runs, 1 skips
>> 260 decicycles in pow, 8191 runs, 1 skips
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>
> probably ok
>
> thx
pushed, thanks
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Asymptotically faster algorithms should always be preferred if you have
> asymptotical amounts of data
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list