[FFmpeg-devel] [PATCH 2/3] lavfi/ebur128: add metadata injection. - volnorm.patch (1/1)
Jan Ehrhardt
phpdev at ehrhardt.nl
Wed May 1 22:16:45 CEST 2013
I do not see the text yet, so separately what I wrote with respect to
the patch:
Hi,
I am picking up this 'old' discussion, because I saw none of the
one-pass normalization proposals had made its way to FFmpeg yet. Unless
I am very mistaken all current methods are two pass: first analyze the
volume level and then normalize the sound level. Correct me if I am
wrong.
Nicolas George in gmane.comp.video.ffmpeg.devel (Sat, 16 Mar 2013
12:36:40 +0100):
>Le quintidi 25 ventôse, an CCXXI, Jan Ehrhardt a écrit :
>> Actually it is useful for volume normalization, in those cases where the
>> overall sound level is either too low or too high. I have used Clément's
>> first patch set for over 2 weeks now on about 200 videofiles with an
>> average duration of 1 hour. It worked exactly as what we expected:
>> lowering the sound level of a few recordings and increasing the volume
>> of most recordings.
>
>It will work if the volume of the whole movie is approximately constant, but
>not at all if you have, for example, a very loud opening and then a quieter
>program.
http://permalink.gmane.org/gmane.comp.video.ffmpeg.devel/159978 was
Clément's first attempt. It is working quite well for us and replaced
the (one pass) -af volnorm filter in MEncoder with only a few flaws.
>Of course, no normalization system can deal with quick changes of volume,
>but with this example, it will take the integrated loudness more than one
>minute to digest the 25 seconds of loud beginning. That is too much.
We experienced this flaw only with a test video and (as far as I know)
with none of the 750 one-hour videos our users transcoded. But I did not
look at all of them...
In the test video the first three normalization frames had a loudness
(I) of -70:
t: 0.0999792 M:-120.7 S:-120.7 I: -70.0 LUFS LRA: 0.0 LU
t: 0.199979 M:-120.7 S:-120.7 I: -70.0 LUFS LRA: 0.0 LU
t: 0.299979 M:-120.7 S:-120.7 I: -70.0 LUFS LRA: 0.0 LU
t: 0.399979 M: -20.7 S:-120.7 I: -20.7 LUFS LRA: 0.0 LU
Our FFmpeg tried to adjust the volume three times with +47dB (70-23),
apparently enough to lead to all kinds of buffer overflows. The result:
a one hour video with a measured I of 10.0 (the maximum).
So I went looking for ways to maximize the adjustment. My first working
example looked like this:
loudness = av_strtod(e->value, NULL);
new_volume = fmax(-20,fmin(20,(-23 - loudness)));
set_fixed_volume(vol, pow(10, new_volume / 20));
The idea: maximize the adjustment within the range -20 up to +20
(measured from the -23 target). This solved the buffer overflow
problems, but had the issue Nicolas George predicted. After the three
initial frames the volume went up to -2 and it took about 30 seconds to
return to -17. An unwanted sound spike at the beginning of the video.
The question arose: how to minimize the adjustments at the beginning of
a video? I went back to f_ebur128.c and inserted another variable to the
metadata: the pts. I could use the pts in af_volume.c to maximize the
change in loudness during the initial seconds. My arbitrary choice:
allow -1/+1 after the first second, -2/+2 after the second second,
-20/+20 after 20 seconds or any longer duration. Of course, it is
possible to lengthen the initial duration to, say, a minute and lower
the maximum adjustment to -10/+10. But the idea is clear. Essential part
of the patch:
if (vol->metadata) {
double loudness, new_volume, pts, timestamp, mx;
AVDictionaryEntry *t, *e;
t = av_dict_get(buf->metadata, "lavfi.r128.pts", NULL, 0);
mx = 20;
if (t) {
pts = av_strtod(t->value, NULL);
timestamp = pts / 48000; /* assume 48kHz */
mx = fmin(mx, timestamp);
}
e = av_dict_get(buf->metadata, vol->metadata, NULL, 0);
if (e) {
loudness = av_strtod(e->value, NULL);
new_volume = fmax(-mx,fmin(mx,(-23 - loudness)));
set_fixed_volume(vol, pow(10, new_volume / 20));
}
}
The mx variable defines the min/max adjustment. By setting an absolute
maximum of 20 and by dividing the pts by 48k, I got the described setup
of -1/+1 per second.
Complete patch attached (if my nntp client handles it correctly).
Applied to yesterdays FFmpeg.
Jan
PS. I also made some changes to the av_log messages: hide them normally,
but show them with -loglevel verbose.
More information about the ffmpeg-devel
mailing list