[FFmpeg-devel] [PATCH] area changed: scdet filter
radu.taraibuta at gmail.com
radu.taraibuta at gmail.com
Sun May 19 19:05:26 EEST 2024
> -----Original Message-----
> From: radu.taraibuta at gmail.com <radu.taraibuta at gmail.com>
> Sent: luni, 13 mai 2024 18:52
> To: ffmpeg-devel at ffmpeg.org
> Subject: [PATCH] area changed: scdet filter
>
> Previous observations:
>
> - Inconsistent code style with other filters. (Mostly using AVFilterLink*
> link instead of AVFilterLink *link).
> I hope it's fine now.
>
> - Unrelated changes, please split trivial unrelated changes into separate
> patches.
> Removed trivial changes from this patch.
>
> - Can't tables be generated at .init/.config_props time? No point in
> storing them into binary.
> Done.
>
> - Adding extra delay is not backward compatible change, it should be
> implemented properly by adding option for users to select mode: next &
prev
> frame or just next or prev frame.
> Added legacy option to the mode parameter.
>
> - Could split frame clone change into earlier separate patch.
> Cannot be done. It's either frame clone or 1 frame delay.
>
> - Where are results of improvements with accuracy so it can be confirmed?
> Here are my test results with manual labeling of scene changes:
> 2379 Full length movie
>
> Method Threshold TP FP FN Precision
> Recall F
> Cubic 7 2357 423 22 0.847841727 0.990752417
> 0.913742973
> Cubic 10 2297 200 82 0.919903885 0.965531736
> 0.94216571
> Cubic 12 2217 146 162 0.938214135 0.931904161
> 0.935048503
> Cubic 15 2049 101 330 0.953023256 0.861286255
> 0.904835505
> Linear 2.8 2357 1060 22 0.689786362
0.990752417
> 0.813319531
> Linear 8 2099 236 280 0.898929336
0.882303489
> 0.890538821
> Linear 10 1886 173 493 0.91597863
0.792770071
> 0.849932402
> Legacy 5 2235 1260 144 0.639484979
0.939470366
> 0.760980592
> Legacy 8 1998 414 381 0.828358209
0.839848676
> 0.83406387
> Legacy 10 1743 193 636 0.900309917
0.732660782
> 0.80787949
>
> 15 HDR10Plus_PB_EAC3JOC
> https://mega.nz/file/nehDka6Z#C5_OPbSZkONdOp1jRmc09C9-
> viDc3zMj8ZHruHcWKyA
>
> Method Threshold TP FP FN Precision
> Recall F
> Cubic 10 15 0 0 1 1 1
> Linear 5 13 1 2 0.928571429
0.866666667
> 0.896551724
> Legacy 5 12 2 3 0.857142857 0.8
> 0.827586207
>
> 21 (HDR HEVC 10-bit BT.2020 24fps) Exodus Sample
> https://mega.nz/file/Sfw1hDpK#ErxCOpQDVjcI1gq6ZbX3vIfdtXZompkFe0jq47E
> hR2o
>
> Method Threshold TP FP FN Precision
> Recall F
> Cubic 10 21 0 0 1 1 1
> Linear 4 20 0 1 1 0.952380952
> 0.975609756
> Legacy 4 19 0 2 1 0.904761905
0.95
>
> 94 Bieber Grammys
> https://mega.nz/#!c9dhAaKA!MG5Yi-
> MJNATE2_KqcnNJZCRKtTWvdjJP1NwG8Ggdw3E
>
> Method Threshold TP FP FN Precision
> Recall F
> Cubic 15 91 23 3 0.798245614 0.968085106
> 0.875
> Cubic 18 85 9 9 0.904255319 0.904255319
> 0.904255319
> Linear 7 79 49 15 0.6171875
0.840425532
> 0.711711712
> Linear 8 74 28 20 0.725490196
0.787234043
> 0.755102041
> Legacy 7 74 40 20 0.649122807
0.787234043
> 0.711538462
> Legacy 8 71 26 23 0.731958763
0.755319149
> 0.743455497
>
>
> Improve scene detection accuracy by comparing frame with both previous and
> next frame (creates one frame delay).
> Add new mode parameter and new method to compute the frame difference
> using
> cubic square to increase the weight of small changes and new mean formula.
> This improves accuracy significantly. Slightly improve performance by not
> using frame clone.
> Add legacy mode for backward compatibility.
>
> Signed-off-by: raduct <radu.taraibuta at gmail.com>
> ---
> doc/filters.texi | 16 ++++
> libavfilter/scene_sad.c | 151 ++++++++++++++++++++++++++++++++++
> libavfilter/scene_sad.h | 6 ++
> libavfilter/vf_scdet.c | 156 +++++++++++++++++++++++++-----------
> tests/fate/filter-video.mak | 3 +
> 5 files changed, 284 insertions(+), 48 deletions(-)
>
> diff --git a/doc/filters.texi b/doc/filters.texi
> index bfa8ccec8b..53814e003b 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -21797,6 +21797,22 @@ Default value is @code{10.}.
> @item sc_pass, s
> Set the flag to pass scene change frames to the next filter. Default
value
> is @code{0}
> You can enable it if you want to get snapshot of scene change frames
only.
> +
> + at item mode
> +Set the scene change detection method. Default value is @code{-1}
> +Available values are:
> +
> + at table @samp
> + at item -1
> +Legacy mode for sum of absolute linear differences. Compare frame with
> previous only and no delay.
> +
> + at item 0
> +Sum of absolute linear differences. Compare frame with both previous and
> next which introduces a 1 frame delay.
> +
> + at item 1
> +Sum of mean of cubic root differences. Compare frame with both previous
> and
> next which introduces a 1 frame delay.
> +
> + at end table
> @end table
>
> @anchor{selectivecolor}
> diff --git a/libavfilter/scene_sad.c b/libavfilter/scene_sad.c
> index caf911eb5d..9b80d426bc 100644
> --- a/libavfilter/scene_sad.c
> +++ b/libavfilter/scene_sad.c
> @@ -21,6 +21,7 @@
> * Scene SAD functions
> */
>
> +#include "libavutil/thread.h"
> #include "scene_sad.h"
>
> void ff_scene_sad16_c(SCENE_SAD_PARAMS)
> @@ -71,3 +72,153 @@ ff_scene_sad_fn ff_scene_sad_get_fn(int depth)
> return sad;
> }
>
> +static AVMutex cbrt_mutex = AV_MUTEX_INITIALIZER;
> +static uint8_t *cbrt_table[16] = { NULL };
> +static int cbrt_table_ref[16] = { 0 };
> +
> +int ff_init_cbrt(int bitdepth)
> +{
> + if (bitdepth < 4 || bitdepth > 16)
> + return AVERROR(EINVAL);
> +
> + ff_mutex_lock(&cbrt_mutex);
> +
> + uint8_t *table = cbrt_table[bitdepth];
> + if (table) {
> + cbrt_table_ref[bitdepth]++;
> + goto end;
> + }
> +
> + table = av_malloc((1 << bitdepth) * (bitdepth > 8 ? 2 : 1));
> + if (!table)
> + goto end;
> + cbrt_table[bitdepth] = table;
> + cbrt_table_ref[bitdepth] = 1;
> +
> + int size = 1 << bitdepth;
> + double factor = pow(size - 1, 2. / 3.);
> + if (bitdepth <= 8) {
> + for (int i = 0; i < size; i++)
> + table[i] = round(factor * pow(i, 1. / 3.));
> + } else {
> + uint16_t *tablew = (uint16_t*)table;
> + for (int i = 0; i < size; i++)
> + tablew[i] = round(factor * pow(i, 1. / 3.));
> + }
> +
> +end:
> + ff_mutex_unlock(&cbrt_mutex);
> + return table != NULL;
> +}
> +
> +void ff_uninit_cbrt(int bitdepth)
> +{
> + if (bitdepth < 4 || bitdepth > 16)
> + return;
> + ff_mutex_lock(&cbrt_mutex);
> + if (!--cbrt_table_ref[bitdepth]) {
> + av_free(cbrt_table[bitdepth]);
> + cbrt_table[bitdepth] = NULL;
> + }
> + ff_mutex_unlock(&cbrt_mutex);
> +}
> +
> +void ff_scene_scrd_c(SCENE_SAD_PARAMS)
> +{
> + uint64_t scrdPlus = 0;
> + uint64_t scrdMinus = 0;
> + int x, y;
> +
> + uint8_t *table = cbrt_table[8];
> + if (!table) {
> + *sum = 0;
> + return;
> + }
> +
> + for (y = 0; y < height; y++) {
> + for (x = 0; x < width; x++)
> + if (src1[x] > src2[x])
> + scrdMinus += table[src1[x] - src2[x]];
> + else
> + scrdPlus += table[src2[x] - src1[x]];
> + src1 += stride1;
> + src2 += stride2;
> + }
> +
> + double mean = (sqrt(scrdPlus) + sqrt(scrdMinus)) / 2.0;
> + *sum = 2.0 * mean * mean;
> +}
> +
> +void ff_scene_scrd2B_c(SCENE_SAD_PARAMS, int bitdepth)
> +{
> + uint64_t scrdPlus = 0;
> + uint64_t scrdMinus = 0;
> + const uint16_t *src1w = (const uint16_t*)src1;
> + const uint16_t *src2w = (const uint16_t*)src2;
> + int x, y;
> +
> + uint16_t *table = (uint16_t*)cbrt_table[bitdepth];
> + if (!table) {
> + *sum = 0;
> + return;
> + }
> +
> + stride1 /= 2;
> + stride2 /= 2;
> +
> + for (y = 0; y < height; y++) {
> + for (x = 0; x < width; x++)
> + if (src1w[x] > src2w[x])
> + scrdMinus += table[src1w[x] - src2w[x]];
> + else
> + scrdPlus += table[src2w[x] - src1w[x]];
> + src1w += stride1;
> + src2w += stride2;
> + }
> +
> + double mean = (sqrt(scrdPlus) + sqrt(scrdMinus)) / 2.0;
> + *sum = 2.0 * mean * mean;
> +}
> +
> +void ff_scene_scrd9_c(SCENE_SAD_PARAMS)
> +{
> + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum,
9);
> +}
> +
> +void ff_scene_scrd10_c(SCENE_SAD_PARAMS)
> +{
> + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum,
> 10);
> +}
> +
> +void ff_scene_scrd12_c(SCENE_SAD_PARAMS)
> +{
> + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum,
> 12);
> +}
> +
> +void ff_scene_scrd14_c(SCENE_SAD_PARAMS)
> +{
> + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum,
> 14);
> +}
> +
> +void ff_scene_scrd16_c(SCENE_SAD_PARAMS)
> +{
> + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum,
> 16);
> +}
> +
> +ff_scene_sad_fn ff_scene_scrd_get_fn(int depth)
> +{
> + ff_scene_sad_fn scrd = NULL;
> + if (depth == 8)
> + scrd = ff_scene_scrd_c;
> + else if (depth == 9)
> + scrd = ff_scene_scrd9_c;
> + else if (depth == 10)
> + scrd = ff_scene_scrd10_c;
> + else if (depth == 12)
> + scrd = ff_scene_scrd12_c;
> + else if (depth == 14)
> + scrd = ff_scene_scrd14_c;
> + else if (depth == 16)
> + scrd = ff_scene_scrd16_c;
> + return scrd;
> +}
> diff --git a/libavfilter/scene_sad.h b/libavfilter/scene_sad.h
> index 173a051f2b..c294bd90f9 100644
> --- a/libavfilter/scene_sad.h
> +++ b/libavfilter/scene_sad.h
> @@ -41,4 +41,10 @@ ff_scene_sad_fn ff_scene_sad_get_fn_x86(int depth);
>
> ff_scene_sad_fn ff_scene_sad_get_fn(int depth);
>
> +ff_scene_sad_fn ff_scene_scrd_get_fn(int depth);
> +
> +int ff_init_cbrt(int bitdepth);
> +
> +void ff_uninit_cbrt(int bitdepth);
> +
> #endif /* AVFILTER_SCENE_SAD_H */
> diff --git a/libavfilter/vf_scdet.c b/libavfilter/vf_scdet.c
> index 15399cfebf..93da5837b3 100644
> --- a/libavfilter/vf_scdet.c
> +++ b/libavfilter/vf_scdet.c
> @@ -31,6 +31,18 @@
> #include "scene_sad.h"
> #include "video.h"
>
> +enum SCDETMode {
> + MODE_LEGACY = -1,
> + MODE_LINEAR = 0,
> + MODE_MEAN_CBRT = 1
> +};
> +
> +typedef struct SCDETFrameInfo {
> + AVFrame *picref;
> + double mafd;
> + double diff;
> +} SCDETFrameInfo;
> +
> typedef struct SCDetContext {
> const AVClass *class;
>
> @@ -39,11 +51,12 @@ typedef struct SCDetContext {
> int nb_planes;
> int bitdepth;
> ff_scene_sad_fn sad;
> - double prev_mafd;
> - double scene_score;
> - AVFrame *prev_picref;
> + SCDETFrameInfo curr_frame;
> + SCDETFrameInfo prev_frame;
> +
> double threshold;
> int sc_pass;
> + enum SCDETMode mode;
> } SCDetContext;
>
> #define OFFSET(x) offsetof(SCDetContext, x)
> @@ -55,6 +68,7 @@ static const AVOption scdet_options[] = {
> { "t", "set scene change detect threshold",
> OFFSET(threshold), AV_OPT_TYPE_DOUBLE, {.dbl = 10.}, 0, 100., V|F
},
> { "sc_pass", "Set the flag to pass scene change frames",
> OFFSET(sc_pass), AV_OPT_TYPE_BOOL, {.dbl = 0 }, 0, 1, V|F
},
> { "s", "Set the flag to pass scene change frames",
> OFFSET(sc_pass), AV_OPT_TYPE_BOOL, {.dbl = 0 }, 0, 1, V|F
},
> + { "mode", "scene change detection method",
> OFFSET(mode), AV_OPT_TYPE_INT, {.i64 = MODE_LEGACY},
> MODE_LEGACY,
> MODE_MEAN_CBRT, V|F },
> {NULL}
> };
>
> @@ -91,7 +105,14 @@ static int config_input(AVFilterLink *inlink)
> s->height[plane] = inlink->h >> ((plane == 1 || plane == 2) ?
> desc->log2_chroma_h : 0);
> }
>
> - s->sad = ff_scene_sad_get_fn(s->bitdepth == 8 ? 8 : 16);
> + if (s->mode == MODE_LINEAR || s->mode == MODE_LEGACY)
> + s->sad = ff_scene_sad_get_fn(s->bitdepth == 8 ? 8 : 16);
> + else if (s->mode == MODE_MEAN_CBRT) {
> + int ret = ff_init_cbrt(s->bitdepth);
> + if (ret < 0)
> + return ret;
> + s->sad = ff_scene_scrd_get_fn(s->bitdepth);
> + }
> if (!s->sad)
> return AVERROR(EINVAL);
>
> @@ -101,46 +122,97 @@ static int config_input(AVFilterLink *inlink)
> static av_cold void uninit(AVFilterContext *ctx)
> {
> SCDetContext *s = ctx->priv;
> -
> - av_frame_free(&s->prev_picref);
> + if (s->mode == MODE_LEGACY)
> + av_frame_free(&s->prev_frame.picref);
> + if (s->mode == MODE_MEAN_CBRT)
> + ff_uninit_cbrt(s->bitdepth);
> }
>
> -static double get_scene_score(AVFilterContext *ctx, AVFrame *frame)
> +static void compute_diff(AVFilterContext *ctx)
> {
> - double ret = 0;
> SCDetContext *s = ctx->priv;
> - AVFrame *prev_picref = s->prev_picref;
> + AVFrame *prev_picref = s->prev_frame.picref;
> + AVFrame *curr_picref = s->curr_frame.picref;
>
> - if (prev_picref && frame->height == prev_picref->height
> - && frame->width == prev_picref->width) {
> - uint64_t sad = 0;
> - double mafd, diff;
> - uint64_t count = 0;
> + if (prev_picref && curr_picref
> + && curr_picref->height == prev_picref->height
> + && curr_picref->width == prev_picref->width) {
>
> + uint64_t sum = 0;
> + uint64_t count = 0;
> for (int plane = 0; plane < s->nb_planes; plane++) {
> - uint64_t plane_sad;
> + uint64_t plane_sum;
> s->sad(prev_picref->data[plane],
prev_picref->linesize[plane],
> - frame->data[plane], frame->linesize[plane],
> - s->width[plane], s->height[plane], &plane_sad);
> - sad += plane_sad;
> + curr_picref->data[plane],
curr_picref->linesize[plane],
> + s->width[plane], s->height[plane], &plane_sum);
> + sum += plane_sum;
> count += s->width[plane] * s->height[plane];
> }
>
> - mafd = (double)sad * 100. / count / (1ULL << s->bitdepth);
> - diff = fabs(mafd - s->prev_mafd);
> - ret = av_clipf(FFMIN(mafd, diff), 0, 100.);
> - s->prev_mafd = mafd;
> - av_frame_free(&prev_picref);
> + s->curr_frame.mafd = (double)sum * 100. / count / (1ULL <<
> s->bitdepth);
> + if (s->mode == MODE_LEGACY)
> + s->curr_frame.diff = fabs(s->curr_frame.mafd -
> s->prev_frame.mafd);
> + else
> + s->curr_frame.diff = s->curr_frame.mafd - s->prev_frame.mafd;
> + } else {
> + s->curr_frame.mafd = 0;
> + s->curr_frame.diff = 0;
> }
> - s->prev_picref = av_frame_clone(frame);
> - return ret;
> }
>
> -static int set_meta(SCDetContext *s, AVFrame *frame, const char *key,
const
> char *value)
> +static int set_meta(AVFrame *frame, const char *key, const char *value)
> {
> return av_dict_set(&frame->metadata, key, value, 0);
> }
>
> +static int filter_frame(AVFilterContext *ctx, AVFrame *frame)
> +{
> + AVFilterLink *inlink = ctx->inputs[0];
> + AVFilterLink *outlink = ctx->outputs[0];
> + SCDetContext *s = ctx->priv;
> +
> + s->prev_frame = s->curr_frame;
> + s->curr_frame.picref = frame;
> +
> + if ((s->mode != MODE_LEGACY && s->prev_frame.picref) || (s->mode ==
> MODE_LEGACY && frame != NULL)) {
> + compute_diff(ctx);
> +
> + if (s->mode == MODE_LEGACY) {
> + av_frame_free(&s->prev_frame.picref);
> + s->prev_frame = s->curr_frame;
> + s->curr_frame.picref = av_frame_clone(s->curr_frame.picref);
> + } else if (s->prev_frame.diff < -s->curr_frame.diff) {
> + s->prev_frame.diff = -s->curr_frame.diff;
> + s->prev_frame.mafd = s->curr_frame.mafd;
> + }
> + double scene_score = av_clipf(s->mode == MODE_LEGACY ?
> FFMIN(s->prev_frame.mafd, s->prev_frame.diff) : FFMAX(s->prev_frame.diff,
> 0), 0, 100.);
> +
> + char buf[64];
> + snprintf(buf, sizeof(buf), "%0.3f", s->prev_frame.mafd);
> + set_meta(s->prev_frame.picref, "lavfi.scd.mafd", buf);
> + snprintf(buf, sizeof(buf), "%0.3f", scene_score);
> + set_meta(s->prev_frame.picref, "lavfi.scd.score", buf);
> +
> + if (scene_score >= s->threshold) {
> + av_log(s, AV_LOG_INFO, "lavfi.scd.score: %.3f,
lavfi.scd.time:
> %s\n",
> + scene_score, av_ts2timestr(s->prev_frame.picref->pts,
> &inlink->time_base));
> + set_meta(s->prev_frame.picref, "lavfi.scd.time",
> + av_ts2timestr(s->prev_frame.picref->pts,
> &inlink->time_base));
> + }
> +
> + if (s->sc_pass) {
> + if (scene_score >= s->threshold)
> + return ff_filter_frame(outlink, s->prev_frame.picref);
> + else
> + av_frame_free(&s->prev_frame.picref);
> + }
> + else
> + return ff_filter_frame(outlink, s->prev_frame.picref);
> + }
> +
> + return 0;
> +}
> +
> static int activate(AVFilterContext *ctx)
> {
> int ret;
> @@ -148,6 +220,8 @@ static int activate(AVFilterContext *ctx)
> AVFilterLink *outlink = ctx->outputs[0];
> SCDetContext *s = ctx->priv;
> AVFrame *frame;
> + int64_t pts;
> + int status;
>
> FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink);
>
> @@ -155,31 +229,17 @@ static int activate(AVFilterContext *ctx)
> if (ret < 0)
> return ret;
>
> - if (frame) {
> - char buf[64];
> - s->scene_score = get_scene_score(ctx, frame);
> - snprintf(buf, sizeof(buf), "%0.3f", s->prev_mafd);
> - set_meta(s, frame, "lavfi.scd.mafd", buf);
> - snprintf(buf, sizeof(buf), "%0.3f", s->scene_score);
> - set_meta(s, frame, "lavfi.scd.score", buf);
> + if (ret > 0)
> + return filter_frame(ctx, frame);
>
> - if (s->scene_score >= s->threshold) {
> - av_log(s, AV_LOG_INFO, "lavfi.scd.score: %.3f,
lavfi.scd.time:
> %s\n",
> - s->scene_score, av_ts2timestr(frame->pts,
> &inlink->time_base));
> - set_meta(s, frame, "lavfi.scd.time",
> - av_ts2timestr(frame->pts, &inlink->time_base));
> - }
> - if (s->sc_pass) {
> - if (s->scene_score >= s->threshold)
> - return ff_filter_frame(outlink, frame);
> - else {
> - av_frame_free(&frame);
> - }
> - } else
> - return ff_filter_frame(outlink, frame);
> + if (ff_inlink_acknowledge_status(inlink, &status, &pts)) {
> + if (status == AVERROR_EOF)
> + ret = filter_frame(ctx, NULL);
> +
> + ff_outlink_set_status(outlink, status, pts);
> + return ret;
> }
>
> - FF_FILTER_FORWARD_STATUS(inlink, outlink);
> FF_FILTER_FORWARD_WANTED(outlink, inlink);
>
> return FFERROR_NOT_READY;
> diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak
> index ee9f0f5e40..cff48e33d9 100644
> --- a/tests/fate/filter-video.mak
> +++ b/tests/fate/filter-video.mak
> @@ -672,6 +672,9 @@ SCDET_DEPS = LAVFI_INDEV FILE_PROTOCOL
> MOVIE_FILTER
> SCDET_FILTER SCALE_FILTER \
> FATE_METADATA_FILTER-$(call ALLYES, $(SCDET_DEPS)) +=
> fate-filter-metadata-scdet
> fate-filter-metadata-scdet: SRC =
> $(TARGET_SAMPLES)/svq3/Vertical400kbit.sorenson3.mov
> fate-filter-metadata-scdet: CMD = run $(FILTER_METADATA_COMMAND)
> "sws_flags=+accurate_rnd+bitexact;movie='$(SRC)',scdet=s=1"
> +FATE_METADATA_FILTER-$(call ALLYES, $(SCDET_DEPS)) +=
> fate-filter-metadata-scdet1
> +fate-filter-metadata-scdet1: SRC =
> $(TARGET_SAMPLES)/svq3/Vertical400kbit.sorenson3.mov
> +fate-filter-metadata-scdet1: CMD = run $(FILTER_METADATA_COMMAND)
> "sws_flags=+accurate_rnd+bitexact;movie='$(SRC)',scdet=s=1:t=6.5:mode=1"
>
> CROPDETECT_DEPS = LAVFI_INDEV FILE_PROTOCOL MOVIE_FILTER
> MOVIE_FILTER
> MESTIMATE_FILTER CROPDETECT_FILTER \
> SCALE_FILTER MOV_DEMUXER H264_DECODER
> --
> 2.43.0.windows.1
>
So what's next? Is there anything else I should do?
More information about the ffmpeg-devel
mailing list