[FFmpeg-devel] [PATCH v3 6/6] lavc/vaapi_encode: Add VAAPI AV1 encoder
Wang, Fei W
fei.w.wang at intel.com
Wed Aug 16 10:54:51 EEST 2023
On Sun, 2023-08-13 at 22:43 +0100, Mark Thompson wrote:
> On 10/08/2023 03:54, Wang, Fei W wrote:
> > On Mon, 2023-08-07 at 22:21 +0100, Mark Thompson wrote:
> > > On 03/08/2023 07:01, fei.w.wang-at-intel.com at ffmpeg.org wrote:
> > > > From: Fei Wang <fei.w.wang at intel.com>
> > > >
> > > > Signed-off-by: Fei Wang <fei.w.wang at intel.com>
> > > > ---
> > > > Changelog | 1 +
> > > > configure | 3 +
> > > > doc/encoders.texi | 13 +
> > > > libavcodec/Makefile | 1 +
> > > > libavcodec/allcodecs.c | 1 +
> > > > libavcodec/vaapi_encode.c | 125 +++-
> > > > libavcodec/vaapi_encode.h | 12 +
> > > > libavcodec/vaapi_encode_av1.c | 1229
> > > > +++++++++++++++++++++++++++++++++
> > > > libavcodec/version.h | 2 +-
> > > > 9 files changed, 1368 insertions(+), 19 deletions(-)
> > > > create mode 100644 libavcodec/vaapi_encode_av1.c
> > >
> > > I assume this is tested on Intel hardware. Is it tested on AMD
> > > as
> > > well? (Apparently working in recent Mesa.)
> >
> > AMD tested the patchset months ago:
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22585
> >
> > This patch changed a little compare with the version in Cartwheel,
> > @Ruijing, Could you help to review this version in ML? To avoid the
> > diffs break you. Thanks.
> >
> > > > ...
> > > > @@ -669,6 +669,15 @@ static int
> > > > vaapi_encode_set_output_timestamp(AVCodecContext *avctx,
> > > > {
> > > > VAAPIEncodeContext *ctx = avctx->priv_data;
> > > >
> > > > + // AV1 packs P frame and next B frame into one pkt, and
> > > > uses
> > > > the other
> > > > + // repeat frame header pkt at the display order position
> > > > of
> > > > the P frame
> > > > + // to indicate its frame index. Each frame has a
> > > > corresponding
> > > > pkt in its
> > > > + // display order position. So don't need to consider delay
> > > > for
> > > > AV1 timestamp.
> > > > + if (avctx->codec_id == AV_CODEC_ID_AV1) {
> > > > + pkt->dts = pkt->pts - ctx->dts_pts_diff;
> > > > + return 0;
> > > > + }
> > >
> > > This doesn't get you the right result, though? The PTS and DTS
> > > on
> > > every AV1 packet want to be the same.
> >
> > The result tested good which can be played normally. Just aligned
> > with
> > other vaapi encoders that the 1st frame start with 0/-1 of PTS/DTS
> > if
> > have B frame. Set PTS/DTS to same also looks good.
>
> DTS != PTS should only be true when the stream has externally-visible
> frame reordering in, which AV1 doesn't.
>
> The other two AV1 encoders both output DTS == PTS; I think you should
> match that.
>
> > > In any case, please don't put tests for codec ID in the common
> > > code. I suggest that the timestamp behaviour wants to be a new
> > > FLAG_*.
> > >
> > > > +
> > > > ...
> > > > @@ -1128,9 +1182,19 @@ static int
> > > > vaapi_encode_pick_next(AVCodecContext *avctx,
> > > >
> > > > vaapi_encode_add_ref(avctx, pic, pic, 0, 1, 0);
> > > > if (pic->type != PICTURE_TYPE_IDR) {
> > > > - vaapi_encode_add_ref(avctx, pic, start,
> > > > - pic->type == PICTURE_TYPE_P,
> > > > - b_counter > 0, 0);
> > > > + // TODO: apply both previous and forward multi
> > > > reference
> > > > for all vaapi encoders.
> > > > + // And L0/L1 reference frame number can be set
> > > > dynamically
> > > > through query
> > > > + // VAConfigAttribEncMaxRefFrames attribute.
> > > > + if (avctx->codec_id == AV_CODEC_ID_AV1) {
> > > > + for (i = 0; i < ctx->nb_next_prev; i++)
> > > > + vaapi_encode_add_ref(avctx, pic, ctx-
> > > > > next_prev[i],
> > > > + pic->type ==
> > > > PICTURE_TYPE_P,
> > > > + b_counter > 0, 0);
> > >
> > > So an undisclosed aim of the previous patch was to make this
> > > extra
> > > list of the most recent N frames for this to use with P frames
> > > only?
> >
> > Currently, yes. As the TODO comment says, I will apply it to all
> > other
> > codecs and B frame next, it will need lots of test on different
> > hardware. Add this for AV1 here is to align with VPL which use 2
> > previous reference for P frame. Base on our test, 2 reference frame
> > for
> > P will improve ~5% BD-rate for quality, so that can make it
> > possible to
> > let user get same quality with VPL after adjusting and aligning
> > encoder
> > options.
>
> Hmm. My Intel test machine says it can take 8 L0 and 2 L1 references
> in H.264. I might try this, seems like it should be easy to test.
Yes, if L0/L1 number queried from driver are fully supported, need to
increase MAX_PICTURE_REFERENCES and init size of surface pool.
>
> > > Why this partcular structure for P frames only, though? Seems
> > > like
> > > if you support extra references then other codecs could also make
> > > use
> > > of them in either direction, so we would be better off hinting
> > > that
> > > the codec would like up to N references and then using the
> > > contents
> > > of the existing DPB, plus keep extras if there is space.
> > >
> > > > ...
> > > > +
> > > > +static av_cold int vaapi_encode_av1_configure(AVCodecContext
> > > > *avctx)
> > > > +{
> > > > + VAAPIEncodeContext *ctx = avctx->priv_data;
> > > > + VAAPIEncodeAV1Context *priv = avctx->priv_data;
> > > > + int ret;
> > > > +
> > > > + ret = ff_cbs_init(&priv->cbc, AV_CODEC_ID_AV1, avctx);
> > > > + if (ret < 0)
> > > > + return ret;
> > > > +
> > > > + if (ctx->rc_mode->quality) {
> > > > + priv->q_idx_p = av_clip(ctx->rc_quality, 0,
> > > > AV1_MAX_QUANT);
> > > > + if (fabs(avctx->i_quant_factor) > 0.0)
> > > > + priv->q_idx_idr =
> > > > + av_clip((fabs(avctx->i_quant_factor) * priv-
> > > > > q_idx_p +
> > > > + avctx->i_quant_offset) + 0.5,
> > > > + 0, AV1_MAX_QUANT);
> > > > + else
> > > > + priv->q_idx_idr = priv->q_idx_p;
> > > > +
> > > > + if (fabs(avctx->b_quant_factor) > 0.0)
> > > > + priv->q_idx_b =
> > > > + av_clip((fabs(avctx->b_quant_factor) * priv-
> > > > > q_idx_p +
> > > > + avctx->b_quant_offset) + 0.5,
> > > > + 0, AV1_MAX_QUANT);
> > > > + else
> > > > + priv->q_idx_b = priv->q_idx_p;
> > > > + } else {
> > > > + /** Arbitrary value */
> > > > + priv->q_idx_idr = priv->q_idx_p = priv->q_idx_b = 128;
> > > > + }
> > >
> > > Are there any alignment requirements here? (If I gave it an
> > > input
> > > which is, say, 1361x1, would that work?)
> >
> > I didn't get you. Which parameter here need to be align? Max/Min
> > resolution HW supported will be checked somewhere else. If use
> > 1361x1
> > there will be the error like:
> >
> > "Hardware does not support encoding at size 1376x16 (constraints:
> > width
> > 32-8192 height 32-8192)."
>
> I mean of padding for the reference surfaces. I was expecting this
> to be rounded up to a multiple of 8 (to match MiCols/MiRows) - see
> this function in other codecs.
>
> > > > ...
> > > > +
> > > > + /** update obu size in bitstream */
> > > > + if (fh_obu->header.obu_has_size_field) {
> > > > + obu_size_len = priv-
> > > > >attr_ext2.bits.obu_size_bytes_minus1
> > > > + 1;
> > > > + for (i = 0; i < obu_size_len; i++) {
> > > > + byte = obu_size >> (7 * i) & 0x7f;
> > > > + if (i < obu_size_len - 1)
> > > > + byte |= 0x80;
> > > > + put_bits(&pbc_tmp, 8, byte);
> > > > + }
> > > > + flush_put_bits(&pbc_tmp);
> > > > + memmove(pbc_tmp.buf_ptr, pbc_tmp.buf_ptr + (8 -
> > > > obu_size_len), obu_size);
> > > > + *data_len -= (8 - obu_size_len) * 8;
> > > > + }
> > >
> > > Why is there an incomplete duplicate of the cbs_av1 header
> > > writing
> > > code here?
> >
> > To record some position/size in bitstream that needed for VAAPI.
> > Like
> > qp_index/loopfilter/cdef offset and cdef parameters size in bit.
> > It's
> > not reasonable to add the specific parameters into CBS.
>
> How about with <
> https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2023-August/313228.html>
> ;?
How to pass position info out of .trace_write_callback? If define own
write_callback function in vaapi_encode_av1.c, and it can easily get
the positions of each syntax element, but can't pass them back to VAAPI
AV1 encoder. A possible way is according
to CodedBitstreamContext.priv_data, but that will need to add lots of
xxx_offset into CodedBitstreamAV1Context.
>
> > > > ...
> > > > +
> > > > +static int vaapi_encode_av1_set_tile(AVCodecContext *avctx)
> > > > +{
> > > > + VAAPIEncodeAV1Context *priv = avctx->priv_data;
> > > > + int mi_cols, mi_rows, sb_shift, sb_size;
> > > > + int max_tile_area_sb, max_tile_area_sb_varied;
> > > > + int tile_width_sb, tile_height_sb, widest_tile_sb;
> > > > + int min_log2_tiles;
> > > > + int tile_rows_tmp, i;
> > > > +
> > > > + if (priv->tile_cols > AV1_MAX_TILE_COLS ||
> > > > + priv->tile_rows > AV1_MAX_TILE_ROWS) {
> > > > + av_log(avctx, AV_LOG_ERROR, "Invalid tile number
> > > > %dx%d,
> > > > should less than %dx%d.\n",
> > > > + priv->tile_cols, priv->tile_rows,
> > > > AV1_MAX_TILE_COLS, AV1_MAX_TILE_ROWS);
> > >
> > > Can't this be a constraint on the option rather than a special
> > > check
> > > here?
> >
> > Tiles set with IMAGE_SIZE option type which aligns with HEVC. It
> > doesn't support Min/MAX check.
>
> Hmm, yeah - fair enough.
>
> > > > ...
> > > > +
> > > > + sh->color_config = (AV1RawColorConfig) {
> > > > + .high_bitdepth = desc->comp[0].depth
> > > > == 8
> > > > ? 0 : 1,
> > > > + .color_primaries = avctx-
> > > > >color_primaries,
> > > > + .transfer_characteristics = avctx->color_trc,
> > > > + .matrix_coefficients = avctx->colorspace,
> > > > + .color_description_present_flag = (avctx-
> > > > >color_primaries
> > > > != AVCOL_PRI_UNSPECIFIED ||
> > > > + avctx-
> > > > > color_trc != AVCOL_TRC_UNSPECIFIED ||
> > > > + avctx-
> > > > > colorspace != AVCOL_SPC_UNSPECIFIED),
> > > > + .color_range = avctx->color_range
> > > > ==
> > > > AVCOL_RANGE_JPEG,
> > > > + .subsampling_x = desc->log2_chroma_w,
> > > > + .subsampling_y = desc->log2_chroma_h,
> > >
> > > .chroma_sample_position = some function of
> > > chroma_sample_location.
> >
> > There is no chroma_sample_position supported in VAAPI yet.
>
> VAAPI won't care. It should be marked correctly in the headers so
> that the output stream has the right value.
>
> (CFL not having any ability to deal with the sample location is a
> known deficiency of AV1.)
>
> > > > ...
> > > > + fh->tile_cols = priv->tile_cols;
> > > > + fh->tile_rows = priv->tile_rows;
> > > > + fh->tile_cols_log2 = priv->tile_cols_log2;
> > > > + fh->tile_rows_log2 = priv->tile_rows_log2;
> > > > + fh->uniform_tile_spacing_flag = priv->uniform_tile;
> > > > + fh->tile_size_bytes_minus1 = priv-
> > > > > attr_ext2.bits.tile_size_bytes_minus1;
> > > > + fh->reduced_tx_set = 1;
> > >
> > > This not being present in the capabilities feels like an omission
> > > in
> > > the API. That's a large loss for an implementation which does
> > > support the full transform set.
> >
> > Personally understanding, "not being present" means it must be
> > supported. If any exception driver, then they can ask to add new
> > capability to libva.
>
> reduced_tx_set is a negative property - you want to set it to zero to
> allow all transforms, unless the implementation can't cope with that.
>
> Does the Intel implementation require it to be set? (That is, does
> not support the full transform set.)
>
That make sense. Intel HW doesn't require it to be set as 1. Will
change it to 0.
> > > > ...
> > > > +
> > > > + for (i = 0; i < fh->tile_cols; i++)
> > > > + fh->width_in_sbs_minus_1[i] = vpic-
> > > > > width_in_sbs_minus_1[i] = priv->width_in_sbs_minus_1[i];
> > > > +
> > > > + for (i = 0; i < fh->tile_rows; i++)
> > > > + fh->height_in_sbs_minus_1[i] = vpic-
> > > > > height_in_sbs_minus_1[i] = priv->height_in_sbs_minus_1[i];
> > >
> > > Er, what? Doesn't this fail the inference on the last tile
> > > width/height if you don't split exactly?
> >
> > What's the meaning of "split exactly"? All the tile w/h will be
> > split
> > in vaapi_encode_av1_set_tile include last tile. Just make sure the
> > info
> > of tiles w/h in frame header are same with the info passed to VAAPI
> > should be fine.
>
> Sorry, I misread this. The values are set elsewhere, it's not
> setting all tiles to have the same SB width/height.
>
> > > > ...
> > > > +
> > > > + vpic->base_qindex = fh->base_q_idx;
> > > > + vpic->frame_width_minus_1 = fh->frame_width_minus_1;
> > > > + vpic->frame_height_minus_1 = fh->frame_height_minus_1;
> > > > + vpic->primary_ref_frame = fh->primary_ref_frame;
> > > > + vpic->reconstructed_frame = pic->recon_surface;
> > > > + vpic->coded_buf = pic->output_buffer;
> > > > + vpic->tile_cols = fh->tile_cols;
> > > > + vpic->tile_rows = fh->tile_rows;
> > > > + vpic->order_hint = fh->order_hint;
> > > > +#if VA_CHECK_VERSION(1, 15, 0)
> > > > + vpic->refresh_frame_flags = fh->refresh_frame_flags;
> > > > +#endif
> > >
> > > What will the difference be between running in the two
> > > versions? (Is
> > > it worth supporting the older one? Cf. bit_depth on VP9.)
> >
> > I'd prefer to support older one. In case of someone using old
> > version
> > and works well with ffmpeg-vpl(as vpl already support av1 encode
> > long
> > time ago), otherwise he must be confuse why ffmpeg-vaapi need
> > higher
> > VAAPI version but VPL doesn't.
>
> Ok. So the driver doesn't use the value of refresh_frame_flags if it
> doesn't affect the result? (Seems like it shouldn't, not sure why it
> would be added to the API.)
Media-driver doesn't use it, but MS Windows driver does, it may need
maintain DPB and pack frame header itself:
https://github.com/intel/cartwheel-ffmpeg/issues/244
Thanks
>
> > > > ...
>
> Thanks,
>
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list