[FFmpeg-devel] [PATCH] add silenceremove filter

Mon Sep 1 12:13:40 CEST 2014

On 8/31/14, Clement Boesch <u at pkh.me> wrote:
> On Fri, Aug 29, 2014 at 06:22:36PM +0000, Paul B Mahol wrote:
>> Signed-off-by: Paul B Mahol <onemda at gmail.com>
>> ---
>>  doc/filters.texi               |  62 ++++++
>>  libavfilter/Makefile           |   1 +
>>  libavfilter/af_silenceremove.c | 478
>> +++++++++++++++++++++++++++++++++++++++++
>>  libavfilter/allfilters.c       |   1 +
>>  4 files changed, 542 insertions(+)
>>  create mode 100644 libavfilter/af_silenceremove.c
>>
>> diff --git a/doc/filters.texi b/doc/filters.texi
>> index cca15fc..ea7da88 100644
>> --- a/doc/filters.texi
>> +++ b/doc/filters.texi
>> @@ -1881,6 +1881,68 @@ ffmpeg -i silence.mp3 -af
>> silencedetect=noise=0.0001 -f null -
>>  @end example
>>  @end itemize
>>
>> + at section silenceremove
>> +
>> +Removes silence from the beginning, middle or end of the audio.
>
> We tend to use the "Remove silence ..." form (no 's').

Changed.
>
> Also, what is exactly the meaning of "from" here? Shouldn't it be "at the
> beginning, in the middle or at the end of..."? (Note: as you know, I'm not
> a native English speaker).

Neither I am.

>
>> +
>> +The filter accepts the following options:
>> +
>> + at table @option
>> + at item start_periods
>> +This value is used to indicate if audio should be trimmed at beginning
>> of
>> +the audio. A value of zero indicates no silence should be trimmed from
>> the
>> +beginning. When specifying an non-zero value, it trims audio up until it
>
> "*a* non-zero value"?

Changed.

>
>> +finds non-silence. Normally, when trimming silence from beginning of
>> audio
>> +the @var{start_periods} will be @code{1} but it can be increased to
>> higher
>> +values to trim all audio up to specific count of non-silence periods.
>> +
>
> Please specify the default value, ditto below

Done.
>
>> + at item start_duration
>> +Specify amount of time that non-silence must be detected before it stops
>
> Specify *the* amount of time
Done.

>
>> +trimming audio. By increasing the duration, burst of noise can be
>> treated
>> +as silence and trimmed off.
>> +
>> + at item start_threshold
>> +This indicate what sample value should be treated as silence. For
>> digital
>> +audio, a value of @code{0} may be fine but for audio recorded from
>> analog,
>> +you may wish to increase the value to account for background noise.
>> +
>> + at item stop_periods
>> +Set the count for trimming silence from the end of audio.
>> +To remove silence from the middle of a file, specify a
>> @var{stop_periods}
>> +that is negative. This value is the threated as a positive value and is
>> also
>
> is *then* *treated*?
Done.

>
> drop the "also" maybe
Done.

>
>> +used to indicate the effect should restart processing as specified by
>> + at var{start_periods}, making it suitable for removing periods of silence
>> +in the middle of the audio.
>> +
>> + at item stop_duration
>> +Specify a duration of silence that must exist before audio is not copied
>> any
>> +more. By specifying a higher duration, silence that is wanted can be
>> left
>> +in the audio.
>> +
>> + at item stop_threshold
>> +This indicate what sample value should be treated as silence but for
>> +trimming silence from the end of audio.
>> +
>> + at item leave_silence
>> +This indicate that @var{stop_duration} length of audio should be left
>> intact
>> +at the beginning of each period of silence.
>> +For example, if you want to remove long pauses between words but do not
>> want
>> +to remove the pauses completely.
>> +
>> + at end table
>> +
>
> I must say I'm extremely confused at all of this; more examples would be
> welcome if you don't mind.
>
>> + at subsection Examples
>> +
>> + at itemize
>> + at item
>> +The following example shows how this filter can be used to start
>> recording
>
> *a* recording?
Done.

>
>> +that does not contain the delay at the start which usually occurs
>> between
>
>> +pressing the record button and the star of the performance:
>
> start?
Fixed.

>
> [...]
>> +typedef struct SilenceRemoveContext {
>> +    const AVClass *class;
>> +
>
>> +    int mode;
>
> Could be made an enum
Done.

>
> [...]
>> +static const AVOption silenceremove_options[] = {
>> +    { "start_periods",   NULL, OFFSET(start_periods),   AV_OPT_TYPE_INT,
>>     {.i64=0},     0, 9000, FLAGS },
>> +    { "start_duration",  NULL, OFFSET(start_duration),
>> AV_OPT_TYPE_DURATION, {.i64=0},     0, 9000, FLAGS },
>> +    { "start_threshold", NULL, OFFSET(start_threshold),
>> AV_OPT_TYPE_DOUBLE,   {.dbl=0},     0,    1, FLAGS },
>> +    { "stop_periods",    NULL, OFFSET(stop_periods),    AV_OPT_TYPE_INT,
>>     {.i64=0}, -9000, 9000, FLAGS },
>> +    { "stop_duration",   NULL, OFFSET(stop_duration),
>> AV_OPT_TYPE_DURATION, {.i64=0},     0, 9000, FLAGS },
>> +    { "stop_threshold",  NULL, OFFSET(stop_threshold),
>> AV_OPT_TYPE_DOUBLE,   {.dbl=0},     0,    1, FLAGS },
>> +    { "leave_silence",   NULL, OFFSET(leave_silence),   AV_OPT_TYPE_INT,
>>     {.i64=0},     0,    1, FLAGS },
>                             ^^^^
>                         this column makes me sad :(
>
>> +    { NULL }
>> +};
>> +
>> +AVFILTER_DEFINE_CLASS(silenceremove);
>> +
>> +#define SILENCE_TRIM        0
>> +#define SILENCE_TRIM_FLUSH  1
>> +#define SILENCE_COPY        2
>> +#define SILENCE_COPY_FLUSH  3
>> +#define SILENCE_STOP        4
>> +
> [...]
>> +static int config_input(AVFilterLink *inlink)
>> +{
>> +    AVFilterContext *ctx = inlink->dst;
>> +    SilenceRemoveContext *s = ctx->priv;
>> +
>
>> +    s->window_size = (inlink->sample_rate / 50) * inlink->channels;
>
> what is 50?

Number used for window size calculation.

>
>> +    s->window = av_malloc_array(s->window_size, sizeof(*s->window));
>> +
>> +    clear_rms(s);
>> +
>> +    s->start_duration = av_rescale(s->start_duration,
>> inlink->sample_rate,
>> +                                   AV_TIME_BASE);
>> +    s->stop_duration  = av_rescale(s->stop_duration,
>> inlink->sample_rate,
>> +                                   AV_TIME_BASE);
>> +
>> +    if (s->start_duration) {
>> +        s->start_holdoff = av_malloc_array(s->start_duration,
>> +                                           sizeof(*s->start_holdoff) *
>> +                                           inlink->channels);
>> +        if (!s->start_holdoff)
>> +            return AVERROR(ENOMEM);
>> +    }
>> +
>> +    s->start_holdoff_offset = 0;
>> +    s->start_holdoff_end    = 0;
>> +    s->start_found_periods  = 0;
>> +
>> +    if (s->stop_duration) {
>> +        s->stop_holdoff = av_malloc_array(s->stop_duration,
>> +                                          sizeof(*s->stop_holdoff) *
>> +                                          inlink->channels);
>> +        if (!s->stop_holdoff)
>> +            return AVERROR(ENOMEM);
>> +    }
>> +
>> +    s->stop_holdoff_offset = 0;
>> +    s->stop_holdoff_end    = 0;
>> +    s->stop_found_periods  = 0;
>> +
>> +    if (s->start_periods)
>> +        s->mode = SILENCE_TRIM;
>> +    else
>> +        s->mode = SILENCE_COPY;
>> +
>> +    return 0;
>> +}
>> +
>> +static int config_output(AVFilterLink *outlink)
>> +{
>> +    outlink->flags |= FF_LINK_FLAG_REQUEST_LOOP;
>> +
>> +    return 0;
>> +}
>> +
>
> nit: you probably can merge the 2 config callbacks
>
>> +static int above_threshold(double value, double threshold)
>> +{
>> +    return value > threshold;
>> +}
>> +
>
> Is this really necessary?

Removed.

>
>> +static double compute_rms(SilenceRemoveContext *s, double sample)
>> +{
>> +    double new_sum;
>> +
>> +    new_sum  = s->rms_sum;
>> +    new_sum -= *s->window_current;
>> +    new_sum += sample * sample;
>> +
>
>> +    return sqrt(new_sum / s->window_size);
>
> Can't you remove the sqrt() and the div by updating the threshold
> reference instead once (th=square(th*s->window_size)) ? I believe it will
> kind of make the code faster.

That can be done later.

>
>> +}
>> +
>> +static void update_rms(SilenceRemoveContext *s, double sample)
>> +{
>> +    s->rms_sum -= *s->window_current;
>> +    *s->window_current = sample * sample;
>> +    s->rms_sum += *s->window_current;
>> +
>> +    s->window_current++;
>> +    if (s->window_current >= s->window_end)
>> +        s->window_current = s->window;
>> +}
>> +
>
>> +static void flush(AVFrame *out, AVFilterLink *outlink,
>> +                  int *nb_samples_written, int *ret)
>> +{
>> +    if (*nb_samples_written) {
>> +        out->nb_samples = *nb_samples_written / outlink->channels;
>> +        *ret = ff_filter_frame(outlink, out);
>> +        *nb_samples_written = 0;
>> +    } else {
>> +        av_frame_free(&out);
>> +    }
>> +}
>
> I suppose you don't want to return the ret value because it avoids
> overwriting its already set value?
>
>> +
>> +static int filter_frame(AVFilterLink *inlink, AVFrame *in)
>> +{
>> +    AVFilterContext *ctx = inlink->dst;
>> +    AVFilterLink *outlink = ctx->outputs[0];
>> +    SilenceRemoveContext *s = ctx->priv;
>> +    int i, j, threshold, ret = 0;
>> +    int nbs, nb_samples_read, nb_samples_written;
>> +    double *obuf, *ibuf = (double *)in->data[0];
>> +    AVFrame *out;
>> +
>> +    nb_samples_read = nb_samples_written = 0;
>> +
>> +    switch (s->mode) {
>> +    case SILENCE_TRIM:
>
>> +silence_trim:
>
> Can't you split all these cases/labels into function and do return func()
> instead of all these weird gotos indirections?

I would prefer to do that later.

>
> [...]
>> +static av_cold void uninit(AVFilterContext *ctx)
>> +{
>> +    SilenceRemoveContext *s = ctx->priv;
>> +
>> +    av_freep(&s->start_holdoff);
>> +    av_freep(&s->stop_holdoff);
>
> I misses the free of window I believe

Fixed.

>
> [...]
>
> No other comment from me, assuming updates for Changelog & RELEASE_NOTES
> and avfilter bump.
>
> --
> Clement B.
>