[FFmpeg-devel] [PATCH v3 6/6] lavc/vaapi_encode: Add VAAPI AV1 encoder

Wed Aug 16 10:54:51 EEST 2023

On Sun, 2023-08-13 at 22:43 +0100, Mark Thompson wrote:
> On 10/08/2023 03:54, Wang, Fei W wrote:
> > On Mon, 2023-08-07 at 22:21 +0100, Mark Thompson wrote:
> > > On 03/08/2023 07:01, fei.w.wang-at-intel.com at ffmpeg.org wrote:
> > > > From: Fei Wang <fei.w.wang at intel.com>
> > > > 
> > > > Signed-off-by: Fei Wang <fei.w.wang at intel.com>
> > > > ---
> > > >    Changelog                     |    1 +
> > > >    configure                     |    3 +
> > > >    doc/encoders.texi             |   13 +
> > > >    libavcodec/Makefile           |    1 +
> > > >    libavcodec/allcodecs.c        |    1 +
> > > >    libavcodec/vaapi_encode.c     |  125 +++-
> > > >    libavcodec/vaapi_encode.h     |   12 +
> > > >    libavcodec/vaapi_encode_av1.c | 1229
> > > > +++++++++++++++++++++++++++++++++
> > > >    libavcodec/version.h          |    2 +-
> > > >    9 files changed, 1368 insertions(+), 19 deletions(-)
> > > >    create mode 100644 libavcodec/vaapi_encode_av1.c
> > > 
> > > I assume this is tested on Intel hardware.  Is it tested on AMD
> > > as
> > > well?  (Apparently working in recent Mesa.)
> > 
> > AMD tested the patchset months ago:
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22585
> > 
> > This patch changed a little compare with the version in Cartwheel,
> > @Ruijing, Could you help to review this version in ML? To avoid the
> > diffs break you. Thanks.
> > 
> > > > ...
> > > > @@ -669,6 +669,15 @@ static int
> > > > vaapi_encode_set_output_timestamp(AVCodecContext *avctx,
> > > >    {
> > > >        VAAPIEncodeContext *ctx = avctx->priv_data;
> > > >    
> > > > +    // AV1 packs P frame and next B frame into one pkt, and
> > > > uses
> > > > the other
> > > > +    // repeat frame header pkt at the display order position
> > > > of
> > > > the P frame
> > > > +    // to indicate its frame index. Each frame has a
> > > > corresponding
> > > > pkt in its
> > > > +    // display order position. So don't need to consider delay
> > > > for
> > > > AV1 timestamp.
> > > > +    if (avctx->codec_id == AV_CODEC_ID_AV1) {
> > > > +        pkt->dts = pkt->pts - ctx->dts_pts_diff;
> > > > +        return 0;
> > > > +    }
> > > 
> > > This doesn't get you the right result, though?  The PTS and DTS
> > > on
> > > every AV1 packet want to be the same.
> > 
> > The result tested good which can be played normally. Just aligned
> > with
> > other vaapi encoders that the 1st frame start with 0/-1 of PTS/DTS
> > if
> > have B frame. Set PTS/DTS to same also looks good.
> 
> DTS != PTS should only be true when the stream has externally-visible 
> frame reordering in, which AV1 doesn't.
> 
> The other two AV1 encoders both output DTS == PTS; I think you should
> match that.
> 
> > > In any case, please don't put tests for codec ID in the common
> > > code.  I suggest that the timestamp behaviour wants to be a new
> > > FLAG_*.
> > > 
> > > > +
> > > > ...
> > > > @@ -1128,9 +1182,19 @@ static int
> > > > vaapi_encode_pick_next(AVCodecContext *avctx,
> > > >    
> > > >        vaapi_encode_add_ref(avctx, pic, pic, 0, 1, 0);
> > > >        if (pic->type != PICTURE_TYPE_IDR) {
> > > > -        vaapi_encode_add_ref(avctx, pic, start,
> > > > -                             pic->type == PICTURE_TYPE_P,
> > > > -                             b_counter > 0, 0);
> > > > +        // TODO: apply both previous and forward multi
> > > > reference
> > > > for all vaapi encoders.
> > > > +        // And L0/L1 reference frame number can be set
> > > > dynamically
> > > > through query
> > > > +        // VAConfigAttribEncMaxRefFrames attribute.
> > > > +        if (avctx->codec_id == AV_CODEC_ID_AV1) {
> > > > +            for (i = 0; i < ctx->nb_next_prev; i++)
> > > > +                vaapi_encode_add_ref(avctx, pic, ctx-
> > > > > next_prev[i],
> > > > +                                     pic->type ==
> > > > PICTURE_TYPE_P,
> > > > +                                     b_counter > 0, 0);
> > > 
> > > So an undisclosed aim of the previous patch was to make this
> > > extra
> > > list of the most recent N frames for this to use with P frames
> > > only?
> > 
> > Currently, yes. As the TODO comment says, I will apply it to all
> > other
> > codecs and B frame next, it will need lots of test on different
> > hardware. Add this for AV1 here is to align with VPL which use 2
> > previous reference for P frame. Base on our test, 2 reference frame
> > for
> > P will improve ~5% BD-rate for quality, so that can make it
> > possible to
> > let user get same quality with VPL after adjusting and aligning
> > encoder
> > options.
> 
> Hmm.  My Intel test machine says it can take 8 L0 and 2 L1 references
> in H.264.  I might try this, seems like it should be easy to test.

Yes, if L0/L1 number queried from driver are fully supported, need to
increase MAX_PICTURE_REFERENCES and init size of surface pool.

> 
> > > Why this partcular structure for P frames only, though?  Seems
> > > like
> > > if you support extra references then other codecs could also make
> > > use
> > > of them in either direction, so we would be better off hinting
> > > that
> > > the codec would like up to N references and then using the
> > > contents
> > > of the existing DPB, plus keep extras if there is space.
> > > 
> > > > ...
> > > > +
> > > > +static av_cold int vaapi_encode_av1_configure(AVCodecContext
> > > > *avctx)
> > > > +{
> > > > +    VAAPIEncodeContext     *ctx = avctx->priv_data;
> > > > +    VAAPIEncodeAV1Context *priv = avctx->priv_data;
> > > > +    int ret;
> > > > +
> > > > +    ret = ff_cbs_init(&priv->cbc, AV_CODEC_ID_AV1, avctx);
> > > > +    if (ret < 0)
> > > > +        return ret;
> > > > +
> > > > +    if (ctx->rc_mode->quality) {
> > > > +        priv->q_idx_p = av_clip(ctx->rc_quality, 0,
> > > > AV1_MAX_QUANT);
> > > > +        if (fabs(avctx->i_quant_factor) > 0.0)
> > > > +            priv->q_idx_idr =
> > > > +                av_clip((fabs(avctx->i_quant_factor) * priv-
> > > > > q_idx_p  +
> > > > +                         avctx->i_quant_offset) + 0.5,
> > > > +                        0, AV1_MAX_QUANT);
> > > > +        else
> > > > +            priv->q_idx_idr = priv->q_idx_p;
> > > > +
> > > > +        if (fabs(avctx->b_quant_factor) > 0.0)
> > > > +            priv->q_idx_b =
> > > > +                av_clip((fabs(avctx->b_quant_factor) * priv-
> > > > > q_idx_p  +
> > > > +                         avctx->b_quant_offset) + 0.5,
> > > > +                        0, AV1_MAX_QUANT);
> > > > +        else
> > > > +            priv->q_idx_b = priv->q_idx_p;
> > > > +    } else {
> > > > +        /** Arbitrary value */
> > > > +        priv->q_idx_idr = priv->q_idx_p = priv->q_idx_b = 128;
> > > > +    }
> > > 
> > > Are there any alignment requirements here?  (If I gave it an
> > > input
> > > which is, say, 1361x1, would that work?)
> > 
> > I didn't get you. Which parameter here need to be align? Max/Min
> > resolution HW supported will be checked somewhere else. If use
> > 1361x1
> > there will be the error like:
> > 
> > "Hardware does not support encoding at size 1376x16 (constraints:
> > width
> > 32-8192 height 32-8192)."
> 
> I mean of padding for the reference surfaces.  I was expecting this
> to be rounded up to a multiple of 8 (to match MiCols/MiRows) - see
> this function in other codecs.
> 
> > > > ...
> > > > +
> > > > +    /** update obu size in bitstream */
> > > > +    if (fh_obu->header.obu_has_size_field) {
> > > > +        obu_size_len = priv-
> > > > >attr_ext2.bits.obu_size_bytes_minus1
> > > > + 1;
> > > > +        for (i = 0; i < obu_size_len; i++) {
> > > > +            byte = obu_size >> (7 * i) & 0x7f;
> > > > +            if (i < obu_size_len - 1)
> > > > +                byte |= 0x80;
> > > > +            put_bits(&pbc_tmp, 8, byte);
> > > > +        }
> > > > +        flush_put_bits(&pbc_tmp);
> > > > +        memmove(pbc_tmp.buf_ptr, pbc_tmp.buf_ptr + (8 -
> > > > obu_size_len), obu_size);
> > > > +        *data_len -= (8 - obu_size_len) * 8;
> > > > +    }
> > > 
> > > Why is there an incomplete duplicate of the cbs_av1 header
> > > writing
> > > code here?
> > 
> > To record some position/size in bitstream that needed for VAAPI.
> > Like
> > qp_index/loopfilter/cdef offset and cdef parameters size in bit.
> > It's
> > not reasonable to add the specific parameters into CBS.
> 
> How about with <
> https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2023-August/313228.html>
> ;?

How to pass position info out of .trace_write_callback? If define own
write_callback function in vaapi_encode_av1.c, and it can easily get
the positions of each syntax element, but can't pass them back to VAAPI
AV1 encoder. A possible way is according
to CodedBitstreamContext.priv_data, but that will need to add lots of
xxx_offset into CodedBitstreamAV1Context.

> 
> > > > ...
> > > > +
> > > > +static int vaapi_encode_av1_set_tile(AVCodecContext *avctx)
> > > > +{
> > > > +    VAAPIEncodeAV1Context *priv = avctx->priv_data;
> > > > +    int mi_cols, mi_rows, sb_shift, sb_size;
> > > > +    int max_tile_area_sb, max_tile_area_sb_varied;
> > > > +    int tile_width_sb, tile_height_sb, widest_tile_sb;
> > > > +    int min_log2_tiles;
> > > > +    int tile_rows_tmp, i;
> > > > +
> > > > +    if (priv->tile_cols > AV1_MAX_TILE_COLS ||
> > > > +        priv->tile_rows > AV1_MAX_TILE_ROWS) {
> > > > +        av_log(avctx, AV_LOG_ERROR, "Invalid tile number
> > > > %dx%d,
> > > > should less than %dx%d.\n",
> > > > +               priv->tile_cols, priv->tile_rows,
> > > > AV1_MAX_TILE_COLS, AV1_MAX_TILE_ROWS);
> > > 
> > > Can't this be a constraint on the option rather than a special
> > > check
> > > here?
> > 
> > Tiles set with IMAGE_SIZE option type which aligns with HEVC. It
> > doesn't support Min/MAX check.
> 
> Hmm, yeah - fair enough.
> 
> > > > ...
> > > > +
> > > > +    sh->color_config = (AV1RawColorConfig) {
> > > > +        .high_bitdepth                  = desc->comp[0].depth
> > > > == 8
> > > > ? 0 : 1,
> > > > +        .color_primaries                = avctx-
> > > > >color_primaries,
> > > > +        .transfer_characteristics       = avctx->color_trc,
> > > > +        .matrix_coefficients            = avctx->colorspace,
> > > > +        .color_description_present_flag = (avctx-
> > > > >color_primaries
> > > > != AVCOL_PRI_UNSPECIFIED ||
> > > > +                                           avctx-
> > > > > color_trc       != AVCOL_TRC_UNSPECIFIED ||
> > > > +                                           avctx-
> > > > > colorspace      != AVCOL_SPC_UNSPECIFIED),
> > > > +        .color_range                    = avctx->color_range
> > > > ==
> > > > AVCOL_RANGE_JPEG,
> > > > +        .subsampling_x                  = desc->log2_chroma_w,
> > > > +        .subsampling_y                  = desc->log2_chroma_h,
> > > 
> > > .chroma_sample_position = some function of
> > > chroma_sample_location.
> > 
> > There is no chroma_sample_position supported in VAAPI yet.
> 
> VAAPI won't care.  It should be marked correctly in the headers so
> that the output stream has the right value.
> 
> (CFL not having any ability to deal with the sample location is a
> known deficiency of AV1.)
> 
> > > > ...
> > > > +    fh->tile_cols                 = priv->tile_cols;
> > > > +    fh->tile_rows                 = priv->tile_rows;
> > > > +    fh->tile_cols_log2            = priv->tile_cols_log2;
> > > > +    fh->tile_rows_log2            = priv->tile_rows_log2;
> > > > +    fh->uniform_tile_spacing_flag = priv->uniform_tile;
> > > > +    fh->tile_size_bytes_minus1    = priv-
> > > > > attr_ext2.bits.tile_size_bytes_minus1;
> > > > +    fh->reduced_tx_set            = 1;
> > > 
> > > This not being present in the capabilities feels like an omission
> > > in
> > > the API.  That's a large loss for an implementation which does
> > > support the full transform set.
> > 
> > Personally understanding, "not being present" means it must be
> > supported. If any exception driver, then they can ask to add new
> > capability to libva.
> 
> reduced_tx_set is a negative property - you want to set it to zero to
> allow all transforms, unless the implementation can't cope with that.
> 
> Does the Intel implementation require it to be set?  (That is, does
> not support the full transform set.)
> 

That make sense. Intel HW doesn't require it to be set as 1. Will
change it to 0.

> > > > ...
> > > > +
> > > > +    for (i = 0; i < fh->tile_cols; i++)
> > > > +        fh->width_in_sbs_minus_1[i] = vpic-
> > > > > width_in_sbs_minus_1[i] = priv->width_in_sbs_minus_1[i];
> > > > +
> > > > +    for (i = 0; i < fh->tile_rows; i++)
> > > > +        fh->height_in_sbs_minus_1[i] = vpic-
> > > > > height_in_sbs_minus_1[i] = priv->height_in_sbs_minus_1[i];
> > > 
> > > Er, what?  Doesn't this fail the inference on the last tile
> > > width/height if you don't split exactly?
> > 
> > What's the meaning of "split exactly"? All the tile w/h will be
> > split
> > in vaapi_encode_av1_set_tile include last tile. Just make sure the
> > info
> > of tiles w/h in frame header are same with the info passed to VAAPI
> > should be fine.
> 
> Sorry, I misread this.  The values are set elsewhere, it's not
> setting all tiles to have the same SB width/height.
> 
> > > > ...
> > > > +
> > > > +    vpic->base_qindex          = fh->base_q_idx;
> > > > +    vpic->frame_width_minus_1  = fh->frame_width_minus_1;
> > > > +    vpic->frame_height_minus_1 = fh->frame_height_minus_1;
> > > > +    vpic->primary_ref_frame    = fh->primary_ref_frame;
> > > > +    vpic->reconstructed_frame  = pic->recon_surface;
> > > > +    vpic->coded_buf            = pic->output_buffer;
> > > > +    vpic->tile_cols            = fh->tile_cols;
> > > > +    vpic->tile_rows            = fh->tile_rows;
> > > > +    vpic->order_hint           = fh->order_hint;
> > > > +#if VA_CHECK_VERSION(1, 15, 0)
> > > > +    vpic->refresh_frame_flags  = fh->refresh_frame_flags;
> > > > +#endif
> > > 
> > > What will the difference be between running in the two
> > > versions?  (Is
> > > it worth supporting the older one?  Cf. bit_depth on VP9.)
> > 
> > I'd prefer to support older one. In case of someone using old
> > version
> > and works well with ffmpeg-vpl(as vpl already support av1 encode
> > long
> > time ago), otherwise he must be confuse why ffmpeg-vaapi need
> > higher
> > VAAPI version but VPL doesn't.
> 
> Ok.  So the driver doesn't use the value of refresh_frame_flags if it
> doesn't affect the result?  (Seems like it shouldn't, not sure why it
> would be added to the API.)

Media-driver doesn't use it, but MS Windows driver does, it may need
maintain DPB and pack frame header itself:
https://github.com/intel/cartwheel-ffmpeg/issues/244

Thanks

> 
> > > > ...
> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".