[FFmpeg-devel] [PATCH] avcodec/nellymoserenc: avoid wasteful pow
Ganesh Ajjanagadde
gajjanag at mit.edu
Tue Dec 15 23:25:52 CET 2015
On Tue, Dec 15, 2015 at 2:23 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Wed, Dec 09, 2015 at 06:55:25PM -0500, Ganesh Ajjanagadde wrote:
>> exp2 suffices here. Some trivial (~ 4x) speedup is done in addition here by
>> reusing results.
>>
>
>> This is bit-identical to the old values.
>
> That may be true with a specific compiler and version but
> C makes no such guarantee unless iam missing something
Indeed, fpu behavior is largely uncontrolled. What I meant was that
under reasonable settings, there is no accuracy loss (tested with
clang and gcc). Basically, a priori it is not obvious that this change
is ok, but I tested and the values don't change.
For that matter, this is a rather minor point - many pow
implementations are anyway broken wrt precise < 1 ulp or better yet <
0.5 ulp rounding.
>
>
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> ---
>> libavcodec/nellymoserenc.c | 11 +++++++++--
>> 1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/libavcodec/nellymoserenc.c b/libavcodec/nellymoserenc.c
>> index d998dba..e6023e3 100644
>> --- a/libavcodec/nellymoserenc.c
>> +++ b/libavcodec/nellymoserenc.c
>> @@ -179,8 +179,15 @@ static av_cold int encode_init(AVCodecContext *avctx)
>>
>> /* Generate overlap window */
>> ff_init_ff_sine_windows(7);
>> - for (i = 0; i < POW_TABLE_SIZE; i++)
>> - pow_table[i] = pow(2, -i / 2048.0 - 3.0 + POW_TABLE_OFFSET);
>> + pow_table[0] = 1;
>> + pow_table[1024] = M_SQRT1_2;
>> + for (i = 1; i < 513; i++) {
>> + double tmp = exp2(-i / 2048.0);
>> + pow_table[i] = tmp;
>> + pow_table[1024-i] = M_SQRT1_2 / tmp;
>> + pow_table[1024+i] = tmp * M_SQRT1_2;
>> + pow_table[2048-i] = 0.5 / tmp;
>
> how much overall init time is gained by this ?
> that is time in ffmpeg main() from start to finish when just opening
> the file with no decoding aka ./ffmpeg -i somefile
Don't know, all I know is cycles are unnecessarily wasted. Will put in
cycle numbers.
> ?
>
> iam asking as this makes the intend of the loop harder to
> see/understand
> if this change is done then please add a comment explaining what the
> code does or maybe leave the unoptimized code in a comment
Comment will be added, sorry.
Note that the trick may be employed to a finer resolution, ie by
storing exp2(3/4) etc and utilizing in simple multiplications and
divisions. There will come a point where the overhead of the binary
size increase will dominate. I chose this as this is reasonably
readable and improves speed at minimal binary size impact.
>
> either way replacing pow -> exp2 LGTM
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list