[FFmpeg-devel] [PATCH 3/6] lavc/vvc: Store MIP information over entire CU area

Sat Dec 7 11:54:14 EET 2024

On 07/12/2024 02:22, Nuo Mi wrote:
> On Wed, Dec 4, 2024 at 2:09 AM Frank Plowman <post at frankplowman.com> wrote:
> 
>> Hi,
>>
>> Thanks for your review.
>>
>> On 03/12/2024 10:04, Nuo Mi wrote:
>>> Hi Frank,
>>> Thank you for the patch
>>>
>>> On Sat, Nov 30, 2024 at 8:13 PM Frank Plowman <post at frankplowman.com>
>> wrote:
>>>
>>>> On 30/11/2024 06:46, Nuo Mi wrote:
>>>>> On Fri, Nov 29, 2024 at 6:19 AM Frank Plowman <post at frankplowman.com>
>>>> wrote:
>>>>>
>>>>>> Previously, the code only stored the MIP mode and transpose flag in
>> the
>>>>>> relevant tables at the top-left corner of the CU.  This information
>> ends
>>>>>> up being retrieved in ff_vvc_intra_pred_* not based on the CU position
>>>>>> but instead the transform unit position (specifically, using the x0
>> and
>>>>>> y0 from get_luma_predict_unit).  There might be multiple transform
>> units
>>>>>> in a CU, hence the top-left corner of the transform unit might not
>>>>>> coincide with the top-left corner of the CU.  Consequently, we need to
>>>>>> store the MIP information at all positions in the CU, not only its
>>>>>> top-left corner, as we already do for the MIP flag.
>>>>>>
>>>>>> Signed-off-by: Frank Plowman <post at frankplowman.com>
>>>>>> ---
>>>>>>  libavcodec/vvc/ctu.c | 4 ++--
>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/libavcodec/vvc/ctu.c b/libavcodec/vvc/ctu.c
>>>>>> index 1e06119cfd..0030938cf5 100644
>>>>>> --- a/libavcodec/vvc/ctu.c
>>>>>> +++ b/libavcodec/vvc/ctu.c
>>>>>> @@ -975,8 +975,8 @@ static void intra_luma_pred_modes(VVCLocalContext
>>>> *lc)
>>>>>>              for (int y = 0; y < (cb_height>>log2_min_cb_size); y++) {
>>>>>>                  int width = cb_width>>log2_min_cb_size;
>>>>>>                  memset(&fc->tab.imf[x],  cu->intra_mip_flag, width);
>>>>>> -                fc->tab.imtf[x] = intra_mip_transposed_flag;
>>>>>> -                fc->tab.imm[x]  = intra_mip_mode;
>>>>>> +                memset(&fc->tab.imtf[x], intra_mip_transposed_flag,
>>>>>> width);
>>>>>> +                memset(&fc->tab.imm[x], intra_mip_mode, width);
>>>>>
>>>>>  intra_mip_mode is 4 bits, 2 flags are 2 bits. maybe we can use a
>> uint8_t
>>>>> for 3 fields,
>>>>> We only need 1 memset and save 2/3 memory.
>>>>
>>>> I've implemented this (patch attached, to be applied atop the set), but
>>>> it's not as straightforward as it may seem.  In particular, because the
>>>> tables are read directly from when determining which CABAC context to
>>>> use for these flags, we have to add quite a lot of extra code in cabac.c
>>>> to support this special case where the MIP information is a bit field.
>>>> In my implementation, this was done by adding this coerce_to_bool
>>>> parameter to get_inc and get_top.  This does actually save a moderate
>>>> amount of memory though, ~1MB for 4K and ~256kB for 1080p.
>>>>
>>> A coding block can be as small as 4x4, so for a 1080p frame, the memory
>>> required is approximately  2*1920*1080/(4*4) ~= 256 kB.
>>> However, with a maximum delay of 16 frames, the total memory usage will
>> be
>>> 256kB * 16=4 MB.
>>>
>>> In your patch, coerce_to_bool is set to 1 only when we are in
>>> ff_vvc_intra_mip_flag.
>>> How about defining get_mip_inc like this? This way, we can avoid changing
>>> get_inc
>>>
>>> static av_always_inline
>>> uint8_t get_mip_inc (VVCLocalContext *lc, const uint8_t *ctx)
>>> {
>>>     uint8_t left = 0, top = 0;
>>>     get_left_top(lc, &left, &top, lc->cu->x0, lc->cu->y0, ctx, ctx);
>>>     return !!left + !!top;
>>> }
>>
>> What about instead taking a function pointer as an argument to
>> get_inc/get_left_top which can be used to modify the value read from the
>> table before using it, or NULL to apply no such modification?
> 
> Function pointers have a cost
> 

Provided the caller and callee are defined in the same translation unit
and the caller is inlined, as would have been the case here, the
indirect call can be optimised away and the call inlined.  GCC calls
this optimisation indirect-inlining.

In any case, I ended up not taking this approach for slightly different
reasons and implemented your first suggestion instead.  It should be on
the ML as v2 now.

Cheers,
Frank