[FFmpeg-devel] [PATCH] add silenceremove filter
Paul B Mahol
onemda at gmail.com
Mon Sep 1 12:13:40 CEST 2014
On 8/31/14, Clement Boesch <u at pkh.me> wrote:
> On Fri, Aug 29, 2014 at 06:22:36PM +0000, Paul B Mahol wrote:
>> Signed-off-by: Paul B Mahol <onemda at gmail.com>
>> ---
>> doc/filters.texi | 62 ++++++
>> libavfilter/Makefile | 1 +
>> libavfilter/af_silenceremove.c | 478
>> +++++++++++++++++++++++++++++++++++++++++
>> libavfilter/allfilters.c | 1 +
>> 4 files changed, 542 insertions(+)
>> create mode 100644 libavfilter/af_silenceremove.c
>>
>> diff --git a/doc/filters.texi b/doc/filters.texi
>> index cca15fc..ea7da88 100644
>> --- a/doc/filters.texi
>> +++ b/doc/filters.texi
>> @@ -1881,6 +1881,68 @@ ffmpeg -i silence.mp3 -af
>> silencedetect=noise=0.0001 -f null -
>> @end example
>> @end itemize
>>
>> + at section silenceremove
>> +
>> +Removes silence from the beginning, middle or end of the audio.
>
> We tend to use the "Remove silence ..." form (no 's').
Changed.
>
> Also, what is exactly the meaning of "from" here? Shouldn't it be "at the
> beginning, in the middle or at the end of..."? (Note: as you know, I'm not
> a native English speaker).
Neither I am.
>
>> +
>> +The filter accepts the following options:
>> +
>> + at table @option
>> + at item start_periods
>> +This value is used to indicate if audio should be trimmed at beginning
>> of
>> +the audio. A value of zero indicates no silence should be trimmed from
>> the
>> +beginning. When specifying an non-zero value, it trims audio up until it
>
> "*a* non-zero value"?
Changed.
>
>> +finds non-silence. Normally, when trimming silence from beginning of
>> audio
>> +the @var{start_periods} will be @code{1} but it can be increased to
>> higher
>> +values to trim all audio up to specific count of non-silence periods.
>> +
>
> Please specify the default value, ditto below
Done.
>
>> + at item start_duration
>> +Specify amount of time that non-silence must be detected before it stops
>
> Specify *the* amount of time
Done.
>
>> +trimming audio. By increasing the duration, burst of noise can be
>> treated
>> +as silence and trimmed off.
>> +
>> + at item start_threshold
>> +This indicate what sample value should be treated as silence. For
>> digital
>> +audio, a value of @code{0} may be fine but for audio recorded from
>> analog,
>> +you may wish to increase the value to account for background noise.
>> +
>> + at item stop_periods
>> +Set the count for trimming silence from the end of audio.
>> +To remove silence from the middle of a file, specify a
>> @var{stop_periods}
>> +that is negative. This value is the threated as a positive value and is
>> also
>
> is *then* *treated*?
Done.
>
> drop the "also" maybe
Done.
>
>> +used to indicate the effect should restart processing as specified by
>> + at var{start_periods}, making it suitable for removing periods of silence
>> +in the middle of the audio.
>> +
>> + at item stop_duration
>> +Specify a duration of silence that must exist before audio is not copied
>> any
>> +more. By specifying a higher duration, silence that is wanted can be
>> left
>> +in the audio.
>> +
>> + at item stop_threshold
>> +This indicate what sample value should be treated as silence but for
>> +trimming silence from the end of audio.
>> +
>> + at item leave_silence
>> +This indicate that @var{stop_duration} length of audio should be left
>> intact
>> +at the beginning of each period of silence.
>> +For example, if you want to remove long pauses between words but do not
>> want
>> +to remove the pauses completely.
>> +
>> + at end table
>> +
>
> I must say I'm extremely confused at all of this; more examples would be
> welcome if you don't mind.
>
>> + at subsection Examples
>> +
>> + at itemize
>> + at item
>> +The following example shows how this filter can be used to start
>> recording
>
> *a* recording?
Done.
>
>> +that does not contain the delay at the start which usually occurs
>> between
>
>> +pressing the record button and the star of the performance:
>
> start?
Fixed.
>
> [...]
>> +typedef struct SilenceRemoveContext {
>> + const AVClass *class;
>> +
>
>> + int mode;
>
> Could be made an enum
Done.
>
> [...]
>> +static const AVOption silenceremove_options[] = {
>> + { "start_periods", NULL, OFFSET(start_periods), AV_OPT_TYPE_INT,
>> {.i64=0}, 0, 9000, FLAGS },
>> + { "start_duration", NULL, OFFSET(start_duration),
>> AV_OPT_TYPE_DURATION, {.i64=0}, 0, 9000, FLAGS },
>> + { "start_threshold", NULL, OFFSET(start_threshold),
>> AV_OPT_TYPE_DOUBLE, {.dbl=0}, 0, 1, FLAGS },
>> + { "stop_periods", NULL, OFFSET(stop_periods), AV_OPT_TYPE_INT,
>> {.i64=0}, -9000, 9000, FLAGS },
>> + { "stop_duration", NULL, OFFSET(stop_duration),
>> AV_OPT_TYPE_DURATION, {.i64=0}, 0, 9000, FLAGS },
>> + { "stop_threshold", NULL, OFFSET(stop_threshold),
>> AV_OPT_TYPE_DOUBLE, {.dbl=0}, 0, 1, FLAGS },
>> + { "leave_silence", NULL, OFFSET(leave_silence), AV_OPT_TYPE_INT,
>> {.i64=0}, 0, 1, FLAGS },
> ^^^^
> this column makes me sad :(
>
>> + { NULL }
>> +};
>> +
>> +AVFILTER_DEFINE_CLASS(silenceremove);
>> +
>> +#define SILENCE_TRIM 0
>> +#define SILENCE_TRIM_FLUSH 1
>> +#define SILENCE_COPY 2
>> +#define SILENCE_COPY_FLUSH 3
>> +#define SILENCE_STOP 4
>> +
> [...]
>> +static int config_input(AVFilterLink *inlink)
>> +{
>> + AVFilterContext *ctx = inlink->dst;
>> + SilenceRemoveContext *s = ctx->priv;
>> +
>
>> + s->window_size = (inlink->sample_rate / 50) * inlink->channels;
>
> what is 50?
Number used for window size calculation.
>
>> + s->window = av_malloc_array(s->window_size, sizeof(*s->window));
>> +
>> + clear_rms(s);
>> +
>> + s->start_duration = av_rescale(s->start_duration,
>> inlink->sample_rate,
>> + AV_TIME_BASE);
>> + s->stop_duration = av_rescale(s->stop_duration,
>> inlink->sample_rate,
>> + AV_TIME_BASE);
>> +
>> + if (s->start_duration) {
>> + s->start_holdoff = av_malloc_array(s->start_duration,
>> + sizeof(*s->start_holdoff) *
>> + inlink->channels);
>> + if (!s->start_holdoff)
>> + return AVERROR(ENOMEM);
>> + }
>> +
>> + s->start_holdoff_offset = 0;
>> + s->start_holdoff_end = 0;
>> + s->start_found_periods = 0;
>> +
>> + if (s->stop_duration) {
>> + s->stop_holdoff = av_malloc_array(s->stop_duration,
>> + sizeof(*s->stop_holdoff) *
>> + inlink->channels);
>> + if (!s->stop_holdoff)
>> + return AVERROR(ENOMEM);
>> + }
>> +
>> + s->stop_holdoff_offset = 0;
>> + s->stop_holdoff_end = 0;
>> + s->stop_found_periods = 0;
>> +
>> + if (s->start_periods)
>> + s->mode = SILENCE_TRIM;
>> + else
>> + s->mode = SILENCE_COPY;
>> +
>> + return 0;
>> +}
>> +
>> +static int config_output(AVFilterLink *outlink)
>> +{
>> + outlink->flags |= FF_LINK_FLAG_REQUEST_LOOP;
>> +
>> + return 0;
>> +}
>> +
>
> nit: you probably can merge the 2 config callbacks
>
>> +static int above_threshold(double value, double threshold)
>> +{
>> + return value > threshold;
>> +}
>> +
>
> Is this really necessary?
Removed.
>
>> +static double compute_rms(SilenceRemoveContext *s, double sample)
>> +{
>> + double new_sum;
>> +
>> + new_sum = s->rms_sum;
>> + new_sum -= *s->window_current;
>> + new_sum += sample * sample;
>> +
>
>> + return sqrt(new_sum / s->window_size);
>
> Can't you remove the sqrt() and the div by updating the threshold
> reference instead once (th=square(th*s->window_size)) ? I believe it will
> kind of make the code faster.
That can be done later.
>
>> +}
>> +
>> +static void update_rms(SilenceRemoveContext *s, double sample)
>> +{
>> + s->rms_sum -= *s->window_current;
>> + *s->window_current = sample * sample;
>> + s->rms_sum += *s->window_current;
>> +
>> + s->window_current++;
>> + if (s->window_current >= s->window_end)
>> + s->window_current = s->window;
>> +}
>> +
>
>> +static void flush(AVFrame *out, AVFilterLink *outlink,
>> + int *nb_samples_written, int *ret)
>> +{
>> + if (*nb_samples_written) {
>> + out->nb_samples = *nb_samples_written / outlink->channels;
>> + *ret = ff_filter_frame(outlink, out);
>> + *nb_samples_written = 0;
>> + } else {
>> + av_frame_free(&out);
>> + }
>> +}
>
> I suppose you don't want to return the ret value because it avoids
> overwriting its already set value?
>
>> +
>> +static int filter_frame(AVFilterLink *inlink, AVFrame *in)
>> +{
>> + AVFilterContext *ctx = inlink->dst;
>> + AVFilterLink *outlink = ctx->outputs[0];
>> + SilenceRemoveContext *s = ctx->priv;
>> + int i, j, threshold, ret = 0;
>> + int nbs, nb_samples_read, nb_samples_written;
>> + double *obuf, *ibuf = (double *)in->data[0];
>> + AVFrame *out;
>> +
>> + nb_samples_read = nb_samples_written = 0;
>> +
>> + switch (s->mode) {
>> + case SILENCE_TRIM:
>
>> +silence_trim:
>
> Can't you split all these cases/labels into function and do return func()
> instead of all these weird gotos indirections?
I would prefer to do that later.
>
> [...]
>> +static av_cold void uninit(AVFilterContext *ctx)
>> +{
>> + SilenceRemoveContext *s = ctx->priv;
>> +
>> + av_freep(&s->start_holdoff);
>> + av_freep(&s->stop_holdoff);
>
> I misses the free of window I believe
Fixed.
>
> [...]
>
> No other comment from me, assuming updates for Changelog & RELEASE_NOTES
> and avfilter bump.
>
> --
> Clement B.
>
More information about the ffmpeg-devel
mailing list