[FFmpeg-devel] [PATCH 3/6] lavc/vvc: Store MIP information over entire CU area
Nuo Mi
nuomi2021 at gmail.com
Sat Dec 7 15:07:49 EET 2024
On Sat, Dec 7, 2024 at 5:54 PM Frank Plowman <post at frankplowman.com> wrote:
> On 07/12/2024 02:22, Nuo Mi wrote:
> > On Wed, Dec 4, 2024 at 2:09 AM Frank Plowman <post at frankplowman.com>
> wrote:
> >
> >> Hi,
> >>
> >> Thanks for your review.
> >>
> >> On 03/12/2024 10:04, Nuo Mi wrote:
> >>> Hi Frank,
> >>> Thank you for the patch
> >>>
> >>> On Sat, Nov 30, 2024 at 8:13 PM Frank Plowman <post at frankplowman.com>
> >> wrote:
> >>>
> >>>> On 30/11/2024 06:46, Nuo Mi wrote:
> >>>>> On Fri, Nov 29, 2024 at 6:19 AM Frank Plowman <post at frankplowman.com
> >
> >>>> wrote:
> >>>>>
> >>>>>> Previously, the code only stored the MIP mode and transpose flag in
> >> the
> >>>>>> relevant tables at the top-left corner of the CU. This information
> >> ends
> >>>>>> up being retrieved in ff_vvc_intra_pred_* not based on the CU
> position
> >>>>>> but instead the transform unit position (specifically, using the x0
> >> and
> >>>>>> y0 from get_luma_predict_unit). There might be multiple transform
> >> units
> >>>>>> in a CU, hence the top-left corner of the transform unit might not
> >>>>>> coincide with the top-left corner of the CU. Consequently, we need
> to
> >>>>>> store the MIP information at all positions in the CU, not only its
> >>>>>> top-left corner, as we already do for the MIP flag.
> >>>>>>
> >>>>>> Signed-off-by: Frank Plowman <post at frankplowman.com>
> >>>>>> ---
> >>>>>> libavcodec/vvc/ctu.c | 4 ++--
> >>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/libavcodec/vvc/ctu.c b/libavcodec/vvc/ctu.c
> >>>>>> index 1e06119cfd..0030938cf5 100644
> >>>>>> --- a/libavcodec/vvc/ctu.c
> >>>>>> +++ b/libavcodec/vvc/ctu.c
> >>>>>> @@ -975,8 +975,8 @@ static void
> intra_luma_pred_modes(VVCLocalContext
> >>>> *lc)
> >>>>>> for (int y = 0; y < (cb_height>>log2_min_cb_size);
> y++) {
> >>>>>> int width = cb_width>>log2_min_cb_size;
> >>>>>> memset(&fc->tab.imf[x], cu->intra_mip_flag,
> width);
> >>>>>> - fc->tab.imtf[x] = intra_mip_transposed_flag;
> >>>>>> - fc->tab.imm[x] = intra_mip_mode;
> >>>>>> + memset(&fc->tab.imtf[x], intra_mip_transposed_flag,
> >>>>>> width);
> >>>>>> + memset(&fc->tab.imm[x], intra_mip_mode, width);
> >>>>>
> >>>>> intra_mip_mode is 4 bits, 2 flags are 2 bits. maybe we can use a
> >> uint8_t
> >>>>> for 3 fields,
> >>>>> We only need 1 memset and save 2/3 memory.
> >>>>
> >>>> I've implemented this (patch attached, to be applied atop the set),
> but
> >>>> it's not as straightforward as it may seem. In particular, because
> the
> >>>> tables are read directly from when determining which CABAC context to
> >>>> use for these flags, we have to add quite a lot of extra code in
> cabac.c
> >>>> to support this special case where the MIP information is a bit field.
> >>>> In my implementation, this was done by adding this coerce_to_bool
> >>>> parameter to get_inc and get_top. This does actually save a moderate
> >>>> amount of memory though, ~1MB for 4K and ~256kB for 1080p.
> >>>>
> >>> A coding block can be as small as 4x4, so for a 1080p frame, the memory
> >>> required is approximately 2*1920*1080/(4*4) ~= 256 kB.
> >>> However, with a maximum delay of 16 frames, the total memory usage will
> >> be
> >>> 256kB * 16=4 MB.
> >>>
> >>> In your patch, coerce_to_bool is set to 1 only when we are in
> >>> ff_vvc_intra_mip_flag.
> >>> How about defining get_mip_inc like this? This way, we can avoid
> changing
> >>> get_inc
> >>>
> >>> static av_always_inline
> >>> uint8_t get_mip_inc (VVCLocalContext *lc, const uint8_t *ctx)
> >>> {
> >>> uint8_t left = 0, top = 0;
> >>> get_left_top(lc, &left, &top, lc->cu->x0, lc->cu->y0, ctx, ctx);
> >>> return !!left + !!top;
> >>> }
> >>
> >> What about instead taking a function pointer as an argument to
> >> get_inc/get_left_top which can be used to modify the value read from the
> >> table before using it, or NULL to apply no such modification?
> >
> > Function pointers have a cost
> >
>
> Provided the caller and callee are defined in the same translation unit
> and the caller is inlined, as would have been the case here, the
> indirect call can be optimised away and the call inlined. GCC calls
> this optimisation indirect-inlining.
>
👍
>
> In any case, I ended up not taking this approach for slightly different
> reasons and implemented your first suggestion instead. It should be on
> the ML as v2 now.
> Cheers,
> Frank
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
We are using too much memory. Please do some profiling using Massif or
another tool if you're interested.
More information about the ffmpeg-devel
mailing list