[FFmpeg-devel] [PATCH] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm
Michael Niedermayer
michael at niedermayer.cc
Sun Oct 11 01:51:01 CEST 2015
On Sun, Oct 11, 2015 at 01:14:27AM +0200, Michael Niedermayer wrote:
> On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote:
> > On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde
> > <gajjanagadde at gmail.com> wrote:
> > > This uses Stein's binary GCD algorithm:
> > > https://en.wikipedia.org/wiki/Binary_GCD_algorithm
> > > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard architectures.
> > > Have not benchmarked, so I can't comment
> >
> > Before you submit a patch that's supposed to make something faster,
> > you should benchmark it to verify that it is in fact faster. Do this
> > with inputs of various sizes on both 32- and 64-bit architectures and
> > both with and without compilers that support __builtin_ctzll(v).
>
> without __builtin_ctzll() the old code seems faster in a simple test
> like:
> make -j12 && ./ffmpeg -i matrixbench_mpeg2.mpg test.mov
>
> also it seems a simple
> while (u != v) {
> if (u > v)
> FFSWAP(int64_t, v, u);
> v -= u;
> do {
> v >>= 1;
> } while(!(v & 1));
> }
> is faster than the non builtin ctzll in its place but still not as
> fast as the current code
heres a ff_ctzll_c with LUTs, if someone wants to play more with
it (its almost as fast as the old code when theres no __builtin_ctzll())
for the matrixbench_mpeg2 -> mov testcase
static av_always_inline int ff_ctzll_c(long long v)
+{
+ int c;
+ static const uint8_t lut[256] =
+ {
+ 8,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 7,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 6,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 5,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ 4,0,1,0,2,0,1,0,3,0,1,0,2,0,1,0,
+ };
+ c = lut[v&255];
+
+ if (c<8)
+ return c;
+
+ c = 0;
+ if (!(v & 0xffffffff)) {
+ v >>= 32;
+ c += 32;
+ }
+ if (!(v & 0xffff)) {
+ v >>= 16;
+ c += 16;
+ }
+ if (!(v & 0xff)) {
+ v >>= 8;
+ c += 8;
+ }
+ return c + lut[v&255];
+}
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151011/583dbfb5/attachment.sig>
More information about the ffmpeg-devel
mailing list