[FFmpeg-devel] [PATCH] avcodec/aac_tablegen: speed up table initialization
Rostislav Pehlivanov
atomnuker at gmail.com
Fri Nov 27 11:35:13 CET 2015
LGTM, but could you leave (just comment it out) the old code in there
so it's a little easier to follow?
> //ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0);
> //ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0);
The accuracy increase is always nice.
On Thu, 2015-11-26 at 16:31 -0500, Ganesh Ajjanagadde wrote:
> This speeds up aac_tablegen to a ludicruous degree (~97%), i.e to the
> point
> where it can be argued that runtime initialization can always be done
> instead of
> hard-coded tables. The only cost is essentially a trivial increase in
> the stack size.
>
> Even if one does not care about this, the patch also improves
> accuracy
> as detailed below.
>
> Performance:
> Benchmark obtained by looping 10^4 times over ff_aac_tableinit.
>
> Sample benchmark (x86-64, Haswell, GNU/Linux):
> old:
> 1295292 decicycles in ff_aac_tableinit, 512 runs, 0 skips
> 1275981 decicycles in ff_aac_tableinit, 1024 runs, 0 skips
> 1272932 decicycles in ff_aac_tableinit, 2048 runs, 0 skips
> 1262164 decicycles in ff_aac_tableinit, 4096 runs, 0 skips
> 1256720 decicycles in ff_aac_tableinit, 8192 runs, 0 skips
>
> new:
> 25691 decicycles in ff_aac_tableinit, 505 runs, 7 skips
> 25130 decicycles in ff_aac_tableinit, 1016 runs, 8 skips
> 25973 decicycles in ff_aac_tableinit, 2036 runs, 12 skips
> 25911 decicycles in ff_aac_tableinit, 4078 runs, 18 skips
> 25816 decicycles in ff_aac_tableinit, 8154 runs, 38 skips
>
> Accuracy:
> The previous code was resulting in needless loss of
> accuracy due to the pow being called in succession. As an
> illustration
> of this:
> ff_aac_pow34sf_tab[3]
> old : 0.000000000007598092294225
> new : 0.000000000007598091426864
> real: 0.000000000007598091778545
>
> truncated to float
> old : 0.000000000007598092294225
> new : 0.000000000007598091426864
> real: 0.000000000007598091426864
>
> showing that the old value was not correctly rounded. This affects a
> large number of elements of the array.
>
> Patch tested with FATE.
>
> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
> ---
> libavcodec/aac_tablegen.h | 38 ++++++++++++++++++++++++++++++++++++-
> -
> 1 file changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/aac_tablegen.h b/libavcodec/aac_tablegen.h
> index 8b223f9..255723b 100644
> --- a/libavcodec/aac_tablegen.h
> +++ b/libavcodec/aac_tablegen.h
> @@ -35,9 +35,43 @@ float ff_aac_pow34sf_tab[428];
> av_cold void ff_aac_tableinit(void)
> {
> int i;
> +
> + /* 2^(i/16) for 0 <= i <= 15 */
> + const double exp2_lut[] = {
> + 1.00000000000000000000,
> + 1.04427378242741384032,
> + 1.09050773266525765921,
> + 1.13878863475669165370,
> + 1.18920711500272106672,
> + 1.24185781207348404859,
> + 1.29683955465100966593,
> + 1.35425554693689272830,
> + 1.41421356237309504880,
> + 1.47682614593949931139,
> + 1.54221082540794082361,
> + 1.61049033194925430818,
> + 1.68179283050742908606,
> + 1.75625216037329948311,
> + 1.83400808640934246349,
> + 1.91520656139714729387,
> + };
> + double t1 = 8.8817841970012523233890533447265625e-16; // 2^(-50)
> + double t2 = 3.63797880709171295166015625e-12; // 2^(-38)
> + int t1_inc_cur, t2_inc_cur;
> + int t1_inc_prev = 0;
> + int t2_inc_prev = 8;
> +
> for (i = 0; i < 428; i++) {
> - ff_aac_pow2sf_tab[i] = pow(2, (i - POW_SF2_ZERO) / 4.0);
> - ff_aac_pow34sf_tab[i] = pow(ff_aac_pow2sf_tab[i], 3.0/4.0);
> + t1_inc_cur = 4 * (i % 4);
> + t2_inc_cur = (8 + 3*i) % 16;
> + if (t1_inc_cur < t1_inc_prev)
> + t1 *= 2;
> + if (t2_inc_cur < t2_inc_prev)
> + t2 *= 2;
> + ff_aac_pow2sf_tab[i] = t1 * exp2_lut[t1_inc_cur];
> + ff_aac_pow34sf_tab[i] = t2 * exp2_lut[t2_inc_cur];
> + t1_inc_prev = t1_inc_cur;
> + t2_inc_prev = t2_inc_cur;
> }
> }
> #endif /* CONFIG_HARDCODED_TABLES */
More information about the ffmpeg-devel
mailing list