[FFmpeg-devel] [PATCH] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
Ganesh Ajjanagadde
gajjanag at mit.edu
Sun Oct 11 01:45:17 CEST 2015
On Sat, Oct 10, 2015 at 7:14 PM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote:
>> On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde
>> <gajjanagadde at gmail.com> wrote:
>> > This uses Stein's binary GCD algorithm:
>> > https://en.wikipedia.org/wiki/Binary_GCD_algorithm
>> > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard architectures.
>> > Have not benchmarked, so I can't comment
>>
>> Before you submit a patch that's supposed to make something faster,
>> you should benchmark it to verify that it is in fact faster. Do this
>> with inputs of various sizes on both 32- and 64-bit architectures and
>> both with and without compilers that support __builtin_ctzll(v).
>
> without __builtin_ctzll() the old code seems faster in a simple test
> like:
> make -j12 && ./ffmpeg -i matrixbench_mpeg2.mpg test.mov
>
> also it seems a simple
> while (u != v) {
> if (u > v)
> FFSWAP(int64_t, v, u);
> v -= u;
> do {
> v >>= 1;
> } while(!(v & 1));
> }
> is faster than the non builtin ctzll in its place but still not as
> fast as the current code
So I checked out the various implementation of (non built in) ctzll at
https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightBinSearch.
The funny thing is, among the looping ones, the fastest is the simplest :):
static inline int ff_ctzll_c(long long v)
{
int c = 0;
while (!(v & 1)) {
c++;
v >>= 1;
}
return c;
}
The De Bruijn based approach is intriguing:
https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup,
since there is no loop there, and assuming multiplies are fast, it
should be good.
I guess one could in principle simply write the asm instruction, but I
can't do that with my current knowledge.
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> What does censorship reveal? It reveals fear. -- Julian Assange
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list