[FFmpeg-devel] [PATCH] checkasm: add support for vvc alf
Nuo Mi
nuomi2021 at gmail.com
Tue Feb 28 17:06:36 EET 2023
On Tue, Feb 28, 2023 at 7:51 PM Martin Storsjö <martin at martin.st> wrote:
> On Sun, 26 Feb 2023, Nuo Mi wrote:
>
> > +#include <string.h>
> > +
> > +#include "libavutil/intreadwrite.h"
> > +#include "libavutil/mem_internal.h"
> > +
> > +#include "libavcodec/avcodec.h"
> > +
> > +#include "libavcodec/vvcdsp.h"
> > +#include "libavcodec/vvcdec.h"
> > +
> > +#include "checkasm.h"
> > +
> > +static const uint32_t pixel_mask[3] = { 0xffffffff, 0x03ff03ff,
> 0x0fff0fff };
> > +
> > +#define SIZEOF_PIXEL ((bit_depth + 7) / 8)
> > +#define PIXEL_STRIDE (ALF_SUBBLOCK_SIZE + 2 * ALF_PADDING_SIZE)
> > +#define BUF_SIZE (PIXEL_STRIDE * (MAX_CTU_SIZE + 3 * 2) * 2) //+3 * 2
> for top and bottom row, *2 for high bit depth
> > +#define LUMA_PARAMS_SIZE (ALF_SUBBLOCK_SIZE / 4 * ALF_SUBBLOCK_SIZE / 4
> * ALF_NUM_COEFF_LUMA)
> > +
> > +#define randomize_buffers(buf0, buf1, size) \
> > + do { \
> > + uint32_t mask = pixel_mask[(bit_depth - 8) >> 1]; \
> > + int k; \
> > + for (k = 0; k < size; k += 4) { \
> > + uint32_t r = rnd() & mask; \
> > + AV_WN32A(buf0 + k, r); \
> > + AV_WN32A(buf1 + k, r); \
> > + } \
> > + } while (0)
> > +
> > +#define randomize_buffers2(buf, size, filter) \
> > + do { \
> > + int k; \
> > + if (filter) { \
> > + for (k = 0; k < size; k++) { \
> > + uint8_t r = rnd(); \
> > + buf[k] = r; \
> > + } \
> > + } else { \
> > + for (k = 0; k < size; k++) { \
> > + int16_t r = rnd(); \
> > + buf[k] = r; \
> > + } \
> > + } \
> > + } while (0)
>
> I don't quite see the point of the extra uint8_t/int16_t variable r here -
> you could just as well assign it directly to buf[k], no? Unless you
> specifically want the effect where you're narrowing the random value to a
> smaller range beforehand.
>
yes. clips can only get special values.
https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_filter_template.c#L653
fixed by
https://github.com/ffvvc/FFmpeg/pull/42/commits/3034ea8050f7f2db580ef4a36bd806e16be943d4#diff-99295cc07508b29c356b6432f8dbd72c5dceb37584fb8b4be0d34da19455864dR61
>
> > +
> > +static void check_alf_luma_filter(VVCDSPContext *c, const int bit_depth)
> > +{
> > + LOCAL_ALIGNED_32(uint8_t, dst0, [BUF_SIZE]);
> > + LOCAL_ALIGNED_32(uint8_t, dst1, [BUF_SIZE]);
> > + LOCAL_ALIGNED_32(uint8_t, src0, [BUF_SIZE]);
> > + LOCAL_ALIGNED_32(uint8_t, src1, [BUF_SIZE]);
> > + int8_t filter[LUMA_PARAMS_SIZE];
> > + int16_t clip[LUMA_PARAMS_SIZE];
> > + ptrdiff_t stride = PIXEL_STRIDE * SIZEOF_PIXEL;
> > + int offset = (3 * PIXEL_STRIDE + 3) * SIZEOF_PIXEL;
> > +
> > + declare_func_emms(AV_CPU_FLAG_AVX2, void, uint8_t *dst, ptrdiff_t
> dst_stride, const uint8_t *src, ptrdiff_t src_stride,
> > + int width, int height, const int8_t *filter, const int16_t
> *clip);
> > +
> > + randomize_buffers(src0, src1, BUF_SIZE);
> > + randomize_buffers2(filter, LUMA_PARAMS_SIZE, 1);
> > + randomize_buffers2(clip, LUMA_PARAMS_SIZE, 2);
>
> Here, both invocations of randomize_buffers2 are called with filter=1 or
> 2, both which will pick the "if (filter) {" case, so the else in
> randomize_buffers2 is unused?
>
fixed
>
> > +
> > + for (int h = 4; h < ALF_SUBBLOCK_SIZE; h += 4) {
> > + for (int w = 4; w < ALF_SUBBLOCK_SIZE; w += 4) {
>
> Wouldn't you want to use <= instead of < for the comparisons here? I don't
> know vvc so I can't say for sure, but that would seem logical to me.
>
you are right, fixed.
>
> Are all aspect ratio combinations allowed here? E.g. the log mentions that
> you're testing 28x4 blocks. In dav1d, similar cases use logic of testing w
> in the range of [h/4, h*4] (plus limited to e.g. [4,64] at the same time).
> That would reduce the number of combinations to test, if they aren't
> really valid in practice.
>
yes. almost all 4x width and height are valid. last col or row may have a
small width and height.
>
> > + if (check_func(c->alf.filter[LUMA],
> "vvc_alf_filter_luma_%dx%d_%d", w, h, bit_depth)) {
> > + memset(dst0, 0, BUF_SIZE);
> > + memset(dst1, 0, BUF_SIZE);
> > + call_ref(dst0, stride, src0 + offset, stride, w, h,
> filter, clip);
> > + call_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > + for (int i = 0; i < h; i++) {
> > + if (memcmp(dst0 + i * stride, dst1 + i * stride, w
> * SIZEOF_PIXEL))
> > + fail();
> > + }
> > + bench_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > + }
> > + if (check_func(c->alf.filter[CHROMA],
> "vvc_alf_filter_chroma_%dx%d_%d", w, h, bit_depth)) {
> > + memset(dst0, 0, BUF_SIZE);
> > + memset(dst1, 0, BUF_SIZE);
> > + call_ref(dst0, stride, src0 + offset, stride, w, h,
> filter, clip);
> > + call_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > + for (int i = 0; i < h; i++) {
> > + if (memcmp(dst0 + i * stride, dst1 + i * stride, w
> * SIZEOF_PIXEL))
> > + fail();
> > + }
> > + bench_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > + }
> > + }
> > + }
> > +}
> > +
> > +void checkasm_check_vvc_alf(void)
> > +{
> > + int bit_depth = 10;
> > + VVCDSPContext h;
> > + ff_vvc_dsp_init(&h, bit_depth);
> > + check_alf_luma_filter(&h, bit_depth);
> > + report("alf_filter");
>
> Is 10 bits the only currently valid bitdepth here? Otherwise I think we
> should include all of them, even if there's only assembly for one case at
> the moment. (The pixel_mask seems to assume 8/10/12 bits.)
>
I use the 10 bits version to collect performance suggestions. After it
done, I will add 8 and 12 bits version
But you are right. I can add 8/12 bits test first.
>
> // Martin
>
> Hi Martin,
Thank you for the review. comments inline.
More information about the ffmpeg-devel
mailing list