[FFmpeg-devel] One pass volume normalization (ebur128)
Jan Ehrhardt
phpdev at ehrhardt.nl
Sat Jul 13 22:15:43 CEST 2013
Nicolas George in gmane.comp.video.ffmpeg.devel (Sat, 13 Jul 2013
21:41:52 +0200):
>Le quintidi 25 messidor, an CCXXI, Jan Ehrhardt a écrit :
>> Subject: [FFmpeg-devel] One pass volume normalization (ebur128)
>
>Single-pass volume normalization is not possible, please do not call the
>feature that way.
Call it what you like. I am using it in a single pass transcode. Just
like the -af volnorm filter in MEncoder.
>r128.I is not a good choice, but there is nothing better yet.
You can use all the r128 variables, that are inserted in the metadata.
>Missing documentation update.
I know.
>> @@ -51,18 +51,24 @@ static const AVOption volume_options[] = {
>> { "fixed", "select 8-bit fixed-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FIXED }, INT_MIN, INT_MAX, A|F, "precision" },
>> { "float", "select 32-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_FLOAT }, INT_MIN, INT_MAX, A|F, "precision" },
>> { "double", "select 64-bit floating-point", 0, AV_OPT_TYPE_CONST, { .i64 = PRECISION_DOUBLE }, INT_MIN, INT_MAX, A|F, "precision" },
>
>> + { "metadata", "set the metadata key for loudness normalization", OFFSET(metadata), AV_OPT_TYPE_STRING, { .str = NULL }, .flags = A|F },
>
>Inconsistent indentation.
Not really. If you look at the origional you will see that fixed, float
and double are values for the precision.
>> + if (vol->metadata) {
>> + double loudness, new_volume, timestamp, mx;
>> + AVDictionaryEntry *e;
>> + mx = 20;
>> + timestamp = (float)(1.0 * buf->pts / outlink->sample_rate);
>> + mx = fmin(mx, timestamp);
>> + e = av_dict_get(buf->metadata, vol->metadata, NULL, 0);
>> + if (e) {
>> + loudness = av_strtod(e->value, NULL);
>> + if (loudness > -69) {
>> + new_volume = fmax(-mx,fmin(mx,(-23 - loudness)));
>> + av_log(NULL, AV_LOG_VERBOSE, "loudness=%f => %f => volume=%f\n",
>> + loudness, new_volume, pow(10, new_volume / 20));
>> + set_fixed_volume(vol, pow(10, new_volume / 20));
>> + }
>
>This paragraph has several problems. First, it is missing spaces around
>words, that is easy to fix.
ACK.
>Second, it has a duplicated mathematical formula, which is pretty much a
>recipe for inconsistency. That is easy to fix too.
ACK.
>Third, it has several hardcoded values, and that is not good design.
Two of the three hardcoded values should be hardcoded. The -23 is part
of the EBU R128 specs: http://tech.ebu.ch/loudness
The 69 was suggested by Clement. If there is no sound at all, the volume
level seems to be reported as -71 or somemething like that. -69 means
there is sound (with a very low volume).
The 20 is indeed an arbitrary choice, to maximize the volume adjustment
during the first 20 seconds of a video.
>It seems to me that using an expression, evaluated each time the metadata
>value changes and with that value available as a variable would be a much
>nicer design.
I agree, but this is a little above my head.
>AFAIK, this is unneeded since the "evil plan".
I do not even know what the "evil plan" is...
>> diff --git a/libavfilter/f_ebur128.c b/libavfilter/f_ebur128.c
>> index 88d37e8..f4ce6d9 100644
>> --- a/libavfilter/f_ebur128.c
>> +++ b/libavfilter/f_ebur128.c
>
>Unrelated.
Not quite either. f_ebur128.c hardcodes the errorlevel to verbose if the
metadata are set. You do not want to see the intermediate metadata if
you do a 'one pass' transcoode. If needed you can always set the
loglevel to view them.
Jan
More information about the ffmpeg-devel
mailing list