[FFmpeg-devel] [PATCH] avcodec/videotoolbox: add AV1 hardware acceleration

Tue Sep 24 02:21:20 EEST 2024

On Mon, Sep 23, 2024 at 12:43 PM Zhao Zhili <quinkblack at foxmail.com> wrote:
>
>
>
> > On Sep 24, 2024, at 01:24, Cameron Gutman <aicommander at gmail.com> wrote:
> >
> > On Mon, Sep 23, 2024 at 6:07 AM Zhao Zhili <quinkblack at foxmail.com> wrote:
> >>
> >>
> >>
> >>> On Sep 21, 2024, at 05:39, Martin Storsjö <martin at martin.st> wrote:
> >>>
> >>> From: Jan Ekström <jeebjp at gmail.com>
> >>>
> >>> Co-authored-by: Ruslan Chernenko <ractyfree at gmail.com>
> >>> Co-authored-by: Martin Storsjö <martin at martin.st>
> >>> ---
> >>> This is a touched up version of Jan and Ruslan's patches for
> >>> AV1 hwaccel via videotoolbox; I tried to polish the code a little
> >>> by not overwriting avctx->extradata in
> >>> ff_videotoolbox_av1c_extradata_create, and by factorizing out a
> >>> new function ff_videotoolbox_buffer_append.
> >>
> >> LGTM, although I don’t have a device with AV1 support.
> >
> > I've asked for some testing from users with M3 MacBooks and it
> > appears to have problems with certain resolutions (notably 4K).
> >
> > https://github.com/moonlight-stream/moonlight-qt/issues/1125
> >
> > It's possible this is a Moonlight bug, but that seems unlikely
> > because VideoToolbox HEVC decoding works fine at 4K and
> > VideoToolbox AV1 works at 1080p and other resolutions.
>
> I can’t tell what’s going wrong from that bug report. Please test
> with ffmpeg and/or ffplay cmdline and share the results.
>

I'm debugging this blind since I don't have hardware either, but I think
we're mishandling Tile Group OBUs in this patch.

Comparing working vs non-working logs, it looks like the encoder is using
2x1 tiling when encoding 4K and 1x1 for smaller unaffected resolutions.

Working:
[av1 @ 0x14f7b14c0] Frame 0:  size 1280x720  upscaled 1280  render
1280x720  subsample 2x2  bitdepth 10  tiles 1x1.
[av1 @ 0x14f7b14c0] Total OBUs on this packet: 4.
[av1 @ 0x14f7b14c0] OBU idx:0, type:2, content available:1.
[av1 @ 0x14f7b14c0] OBU idx:1, type:1, content available:1.
[av1 @ 0x14f7b14c0] OBU idx:2, type:6, content available:1.
[av1 @ 0x14f7b14c0] Format videotoolbox_vld chosen by get_format().
[av1 @ 0x14f7b14c0] Format videotoolbox_vld requires hwaccel
av1_videotoolbox initialisation.
[av1 @ 0x14f7b14c0] AV1 decode get format: videotoolbox_vld.

Broken:
[av1 @ 0x15128b530] Frame 0:  size 3840x2160  upscaled 3840  render
3840x2160  subsample 2x2  bitdepth 10  tiles 2x1.
[av1 @ 0x15128b530] Total OBUs on this packet: 4.
[av1 @ 0x15128b530] OBU idx:0, type:2, content available:1.
[av1 @ 0x15128b530] OBU idx:1, type:1, content available:1.
[av1 @ 0x15128b530] OBU idx:2, type:3, content available:1.
[av1 @ 0x15128b530] Format videotoolbox_vld chosen by get_format().
[av1 @ 0x15128b530] Format videotoolbox_vld requires hwaccel
av1_videotoolbox initialisation.
[av1 @ 0x15128b530] AV1 decode get format: videotoolbox_vld.
[av1 @ 0x15128b530] OBU idx:3, type:4, content available:1.
[av1 @ 0x15128b530] vt decoder cb: output image buffer is null: -17694
[av1 @ 0x15128b530] HW accel end frame fail.

In the broken case, instead of a Frame OBU, we get a Frame Header OBU and
a Tile Group OBU. To handle Tile Group OBUs, av1dec.c calls decode_slice()
function, but videotoolbox_av1_decode_slice() in this patch simply returns
without appending the OBU data to bitstream buffer.

It looks like other AV1 hwaccels ignore the data buffer provided in the
start_frame() callback and instead append to their bitstream buffers in
decode_slice() instead. Maybe that's what we should do here too?