[FFmpeg-devel] [PATCH v3 6/6] lavc/vaapi_encode: Add VAAPI AV1 encoder

Mon Aug 14 00:43:56 EEST 2023

On 10/08/2023 03:54, Wang, Fei W wrote:
> On Mon, 2023-08-07 at 22:21 +0100, Mark Thompson wrote:
>> On 03/08/2023 07:01, fei.w.wang-at-intel.com at ffmpeg.org wrote:
>>> From: Fei Wang <fei.w.wang at intel.com>
>>>
>>> Signed-off-by: Fei Wang <fei.w.wang at intel.com>
>>> ---
>>>    Changelog                     |    1 +
>>>    configure                     |    3 +
>>>    doc/encoders.texi             |   13 +
>>>    libavcodec/Makefile           |    1 +
>>>    libavcodec/allcodecs.c        |    1 +
>>>    libavcodec/vaapi_encode.c     |  125 +++-
>>>    libavcodec/vaapi_encode.h     |   12 +
>>>    libavcodec/vaapi_encode_av1.c | 1229
>>> +++++++++++++++++++++++++++++++++
>>>    libavcodec/version.h          |    2 +-
>>>    9 files changed, 1368 insertions(+), 19 deletions(-)
>>>    create mode 100644 libavcodec/vaapi_encode_av1.c
>>
>> I assume this is tested on Intel hardware.  Is it tested on AMD as
>> well?  (Apparently working in recent Mesa.)
> 
> AMD tested the patchset months ago:
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22585
> 
> This patch changed a little compare with the version in Cartwheel,
> @Ruijing, Could you help to review this version in ML? To avoid the
> diffs break you. Thanks.
> 
>>> ...
>>> @@ -669,6 +669,15 @@ static int
>>> vaapi_encode_set_output_timestamp(AVCodecContext *avctx,
>>>    {
>>>        VAAPIEncodeContext *ctx = avctx->priv_data;
>>>    
>>> +    // AV1 packs P frame and next B frame into one pkt, and uses
>>> the other
>>> +    // repeat frame header pkt at the display order position of
>>> the P frame
>>> +    // to indicate its frame index. Each frame has a corresponding
>>> pkt in its
>>> +    // display order position. So don't need to consider delay for
>>> AV1 timestamp.
>>> +    if (avctx->codec_id == AV_CODEC_ID_AV1) {
>>> +        pkt->dts = pkt->pts - ctx->dts_pts_diff;
>>> +        return 0;
>>> +    }
>>
>> This doesn't get you the right result, though?  The PTS and DTS on
>> every AV1 packet want to be the same.
> 
> The result tested good which can be played normally. Just aligned with
> other vaapi encoders that the 1st frame start with 0/-1 of PTS/DTS if
> have B frame. Set PTS/DTS to same also looks good.

DTS != PTS should only be true when the stream has externally-visible frame reordering in, which AV1 doesn't.

The other two AV1 encoders both output DTS == PTS; I think you should match that.

>> In any case, please don't put tests for codec ID in the common
>> code.  I suggest that the timestamp behaviour wants to be a new
>> FLAG_*.
>>
>>> +
>>> ...
>>> @@ -1128,9 +1182,19 @@ static int
>>> vaapi_encode_pick_next(AVCodecContext *avctx,
>>>    
>>>        vaapi_encode_add_ref(avctx, pic, pic, 0, 1, 0);
>>>        if (pic->type != PICTURE_TYPE_IDR) {
>>> -        vaapi_encode_add_ref(avctx, pic, start,
>>> -                             pic->type == PICTURE_TYPE_P,
>>> -                             b_counter > 0, 0);
>>> +        // TODO: apply both previous and forward multi reference
>>> for all vaapi encoders.
>>> +        // And L0/L1 reference frame number can be set dynamically
>>> through query
>>> +        // VAConfigAttribEncMaxRefFrames attribute.
>>> +        if (avctx->codec_id == AV_CODEC_ID_AV1) {
>>> +            for (i = 0; i < ctx->nb_next_prev; i++)
>>> +                vaapi_encode_add_ref(avctx, pic, ctx-
>>>> next_prev[i],
>>> +                                     pic->type == PICTURE_TYPE_P,
>>> +                                     b_counter > 0, 0);
>>
>> So an undisclosed aim of the previous patch was to make this extra
>> list of the most recent N frames for this to use with P frames only?
> 
> Currently, yes. As the TODO comment says, I will apply it to all other
> codecs and B frame next, it will need lots of test on different
> hardware. Add this for AV1 here is to align with VPL which use 2
> previous reference for P frame. Base on our test, 2 reference frame for
> P will improve ~5% BD-rate for quality, so that can make it possible to
> let user get same quality with VPL after adjusting and aligning encoder
> options.

Hmm.  My Intel test machine says it can take 8 L0 and 2 L1 references in H.264.  I might try this, seems like it should be easy to test.

>>
>> Why this partcular structure for P frames only, though?  Seems like
>> if you support extra references then other codecs could also make use
>> of them in either direction, so we would be better off hinting that
>> the codec would like up to N references and then using the contents
>> of the existing DPB, plus keep extras if there is space.
>>
>>> ...
>>> +
>>> +static av_cold int vaapi_encode_av1_configure(AVCodecContext
>>> *avctx)
>>> +{
>>> +    VAAPIEncodeContext     *ctx = avctx->priv_data;
>>> +    VAAPIEncodeAV1Context *priv = avctx->priv_data;
>>> +    int ret;
>>> +
>>> +    ret = ff_cbs_init(&priv->cbc, AV_CODEC_ID_AV1, avctx);
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    if (ctx->rc_mode->quality) {
>>> +        priv->q_idx_p = av_clip(ctx->rc_quality, 0,
>>> AV1_MAX_QUANT);
>>> +        if (fabs(avctx->i_quant_factor) > 0.0)
>>> +            priv->q_idx_idr =
>>> +                av_clip((fabs(avctx->i_quant_factor) * priv-
>>>> q_idx_p  +
>>> +                         avctx->i_quant_offset) + 0.5,
>>> +                        0, AV1_MAX_QUANT);
>>> +        else
>>> +            priv->q_idx_idr = priv->q_idx_p;
>>> +
>>> +        if (fabs(avctx->b_quant_factor) > 0.0)
>>> +            priv->q_idx_b =
>>> +                av_clip((fabs(avctx->b_quant_factor) * priv-
>>>> q_idx_p  +
>>> +                         avctx->b_quant_offset) + 0.5,
>>> +                        0, AV1_MAX_QUANT);
>>> +        else
>>> +            priv->q_idx_b = priv->q_idx_p;
>>> +    } else {
>>> +        /** Arbitrary value */
>>> +        priv->q_idx_idr = priv->q_idx_p = priv->q_idx_b = 128;
>>> +    }
>>
>> Are there any alignment requirements here?  (If I gave it an input
>> which is, say, 1361x1, would that work?)
> 
> I didn't get you. Which parameter here need to be align? Max/Min
> resolution HW supported will be checked somewhere else. If use 1361x1
> there will be the error like:
> 
> "Hardware does not support encoding at size 1376x16 (constraints: width
> 32-8192 height 32-8192)."

I mean of padding for the reference surfaces.  I was expecting this to be rounded up to a multiple of 8 (to match MiCols/MiRows) - see this function in other codecs.

>>> ...
>>> +
>>> +    /** update obu size in bitstream */
>>> +    if (fh_obu->header.obu_has_size_field) {
>>> +        obu_size_len = priv->attr_ext2.bits.obu_size_bytes_minus1
>>> + 1;
>>> +        for (i = 0; i < obu_size_len; i++) {
>>> +            byte = obu_size >> (7 * i) & 0x7f;
>>> +            if (i < obu_size_len - 1)
>>> +                byte |= 0x80;
>>> +            put_bits(&pbc_tmp, 8, byte);
>>> +        }
>>> +        flush_put_bits(&pbc_tmp);
>>> +        memmove(pbc_tmp.buf_ptr, pbc_tmp.buf_ptr + (8 -
>>> obu_size_len), obu_size);
>>> +        *data_len -= (8 - obu_size_len) * 8;
>>> +    }
>>
>> Why is there an incomplete duplicate of the cbs_av1 header writing
>> code here?
> 
> To record some position/size in bitstream that needed for VAAPI. Like
> qp_index/loopfilter/cdef offset and cdef parameters size in bit. It's
> not reasonable to add the specific parameters into CBS.

How about with <https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2023-August/313228.html>?

>>> ...
>>> +
>>> +static int vaapi_encode_av1_set_tile(AVCodecContext *avctx)
>>> +{
>>> +    VAAPIEncodeAV1Context *priv = avctx->priv_data;
>>> +    int mi_cols, mi_rows, sb_shift, sb_size;
>>> +    int max_tile_area_sb, max_tile_area_sb_varied;
>>> +    int tile_width_sb, tile_height_sb, widest_tile_sb;
>>> +    int min_log2_tiles;
>>> +    int tile_rows_tmp, i;
>>> +
>>> +    if (priv->tile_cols > AV1_MAX_TILE_COLS ||
>>> +        priv->tile_rows > AV1_MAX_TILE_ROWS) {
>>> +        av_log(avctx, AV_LOG_ERROR, "Invalid tile number %dx%d,
>>> should less than %dx%d.\n",
>>> +               priv->tile_cols, priv->tile_rows,
>>> AV1_MAX_TILE_COLS, AV1_MAX_TILE_ROWS);
>>
>> Can't this be a constraint on the option rather than a special check
>> here?
> 
> Tiles set with IMAGE_SIZE option type which aligns with HEVC. It
> doesn't support Min/MAX check.

Hmm, yeah - fair enough.

>>> ...
>>> +
>>> +    sh->color_config = (AV1RawColorConfig) {
>>> +        .high_bitdepth                  = desc->comp[0].depth == 8
>>> ? 0 : 1,
>>> +        .color_primaries                = avctx->color_primaries,
>>> +        .transfer_characteristics       = avctx->color_trc,
>>> +        .matrix_coefficients            = avctx->colorspace,
>>> +        .color_description_present_flag = (avctx->color_primaries
>>> != AVCOL_PRI_UNSPECIFIED ||
>>> +                                           avctx-
>>>> color_trc       != AVCOL_TRC_UNSPECIFIED ||
>>> +                                           avctx-
>>>> colorspace      != AVCOL_SPC_UNSPECIFIED),
>>> +        .color_range                    = avctx->color_range ==
>>> AVCOL_RANGE_JPEG,
>>> +        .subsampling_x                  = desc->log2_chroma_w,
>>> +        .subsampling_y                  = desc->log2_chroma_h,
>>
>> .chroma_sample_position = some function of chroma_sample_location.
> 
> There is no chroma_sample_position supported in VAAPI yet.

VAAPI won't care.  It should be marked correctly in the headers so that the output stream has the right value.

(CFL not having any ability to deal with the sample location is a known deficiency of AV1.)

>>> ...
>>> +    fh->tile_cols                 = priv->tile_cols;
>>> +    fh->tile_rows                 = priv->tile_rows;
>>> +    fh->tile_cols_log2            = priv->tile_cols_log2;
>>> +    fh->tile_rows_log2            = priv->tile_rows_log2;
>>> +    fh->uniform_tile_spacing_flag = priv->uniform_tile;
>>> +    fh->tile_size_bytes_minus1    = priv-
>>>> attr_ext2.bits.tile_size_bytes_minus1;
>>> +    fh->reduced_tx_set            = 1;
>>
>> This not being present in the capabilities feels like an omission in
>> the API.  That's a large loss for an implementation which does
>> support the full transform set.
> 
> Personally understanding, "not being present" means it must be
> supported. If any exception driver, then they can ask to add new
> capability to libva.

reduced_tx_set is a negative property - you want to set it to zero to allow all transforms, unless the implementation can't cope with that.

Does the Intel implementation require it to be set?  (That is, does not support the full transform set.)

>>> ...
>>> +
>>> +    for (i = 0; i < fh->tile_cols; i++)
>>> +        fh->width_in_sbs_minus_1[i] = vpic-
>>>> width_in_sbs_minus_1[i] = priv->width_in_sbs_minus_1[i];
>>> +
>>> +    for (i = 0; i < fh->tile_rows; i++)
>>> +        fh->height_in_sbs_minus_1[i] = vpic-
>>>> height_in_sbs_minus_1[i] = priv->height_in_sbs_minus_1[i];
>>
>> Er, what?  Doesn't this fail the inference on the last tile
>> width/height if you don't split exactly?
> 
> What's the meaning of "split exactly"? All the tile w/h will be split
> in vaapi_encode_av1_set_tile include last tile. Just make sure the info
> of tiles w/h in frame header are same with the info passed to VAAPI
> should be fine.

Sorry, I misread this.  The values are set elsewhere, it's not setting all tiles to have the same SB width/height.

>>> ...
>>> +
>>> +    vpic->base_qindex          = fh->base_q_idx;
>>> +    vpic->frame_width_minus_1  = fh->frame_width_minus_1;
>>> +    vpic->frame_height_minus_1 = fh->frame_height_minus_1;
>>> +    vpic->primary_ref_frame    = fh->primary_ref_frame;
>>> +    vpic->reconstructed_frame  = pic->recon_surface;
>>> +    vpic->coded_buf            = pic->output_buffer;
>>> +    vpic->tile_cols            = fh->tile_cols;
>>> +    vpic->tile_rows            = fh->tile_rows;
>>> +    vpic->order_hint           = fh->order_hint;
>>> +#if VA_CHECK_VERSION(1, 15, 0)
>>> +    vpic->refresh_frame_flags  = fh->refresh_frame_flags;
>>> +#endif
>>
>> What will the difference be between running in the two versions?  (Is
>> it worth supporting the older one?  Cf. bit_depth on VP9.)
> 
> I'd prefer to support older one. In case of someone using old version
> and works well with ffmpeg-vpl(as vpl already support av1 encode long
> time ago), otherwise he must be confuse why ffmpeg-vaapi need higher
> VAAPI version but VPL doesn't.

Ok.  So the driver doesn't use the value of refresh_frame_flags if it doesn't affect the result?  (Seems like it shouldn't, not sure why it would be added to the API.)

>>> ...

Thanks,

- Mark