[FFmpeg-devel] [PATCHv2] lavc/cbrt_tablegen: speed up tablegen
Daniel Serpell
dserpell at gmail.com
Tue Jan 5 16:44:06 CET 2016
Hi!,
El Mon, Jan 04, 2016 at 06:33:59PM -0800, Ganesh Ajjanagadde escribio:
> This exploits an approach based on the sieve of Eratosthenes, a popular
> method for generating prime numbers.
>
> Tables are identical to previous ones.
>
> Tested with FATE with/without --enable-hardcoded-tables.
>
> Sample benchmark (Haswell, GNU/Linux+gcc):
> prev:
> 7860100 decicycles in cbrt_tableinit, 1 runs, 0 skips
> 7777490 decicycles in cbrt_tableinit, 2 runs, 0 skips
> [...]
> 7582339 decicycles in cbrt_tableinit, 256 runs, 0 skips
> 7563556 decicycles in cbrt_tableinit, 512 runs, 0 skips
>
> new:
> 2099480 decicycles in cbrt_tableinit, 1 runs, 0 skips
> 2044470 decicycles in cbrt_tableinit, 2 runs, 0 skips
> [...]
> 1796544 decicycles in cbrt_tableinit, 256 runs, 0 skips
> 1791631 decicycles in cbrt_tableinit, 512 runs, 0 skips
>
See attached code, function "test1", based on an approximation of:
(i+1)^(1/3) ~= i^(1/3) * ( 1 + 1/(3i) - 1/(9i) + 5/(81i) - .... )
Generated values are the same as original floats (max error in double
is < 4*10^-10), it is faster (and I think, simpler) than your version.
Perhaps altering the constants it could be made faster still, but it is
currently dominated by de division in the main loop.
Daniel.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cubert.c
Type: text/x-csrc
Size: 2320 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160105/d4215b82/attachment.c>
More information about the ffmpeg-devel
mailing list