[FFmpeg-devel] [PATCH] libavcodec/qsv.c: Re-design session control and internal allocation

Fri Oct 23 16:52:30 CEST 2015

On Fri, 23 Oct 2015 16:12:07 +0200
Gwenole Beauchesne <gb.devel at gmail.com> wrote:

> Hi,
> 
> 2015-10-23 15:41 GMT+02:00 wm4 <nfxjfg at googlemail.com>:
> > On Fri, 23 Oct 2015 14:54:56 +0200
> > Gwenole Beauchesne <gb.devel at gmail.com> wrote:
> >  
> >> Hi,
> >>
> >> 2015-10-07 18:40 GMT+02:00 wm4 <nfxjfg at googlemail.com>:  
> >> > On Wed, 7 Oct 2015 19:20:56 +0300
> >> > Ivan Uskov <ivan.uskov at nablet.com> wrote:
> >> >  
> >> >> Hello Hendrik,
> >> >>
> >> >> Wednesday, October 7, 2015, 5:58:25 PM, you wrote:
> >> >>  
> >> >> HL> On Wed, Oct 7, 2015 at 4:41 PM, Ivan Uskov <ivan.uskov at nablet.com> wrote:  
> >> >>  
> >> >> HL> Global static variables are not acceptable, sorry.
> >> >> HL> You'll have to find another way to solve your problem, but global
> >> >> HL> state is not the way to go.  
> >> >> Unfortunately   I   do   not   have   ideas  how to provide single and common
> >> >> memory  block  for  separate    modules   by  another  way.   Memory  mapping
> >> >> files  looks not too portable and much more bulky solution then one global variable.
> >> >> I  do  not  see  the  way to use appropriate field of AVCodecContext to share
> >> >> global data.
> >> >> Has anybody any ideas?
> >> >> Is  me  missed  something?  There is really the necessary to have a global common
> >> >> structure shared between decoder, vpp and decoder.
> >> >>  
> >> >
> >> > There's no automagic way to get this done.
> >> >
> >> > Hardware accelerators like vaapi and vdpau need the same thing. These
> >> > have special APIs to set an API context on the decoder. Some APIs use
> >> > hwaccel_context, vdpau uses av_vdpau_bind_context().
> >> >
> >> > The hwaccel_context pointer is untyped (just a void*), and what it means
> >> > is implicit to the hwaccel or the decoder that is used. It could point
> >> > to some sort of qsv state, which will have to be explicitly allocated
> >> > and set by the API user (which might be ffmpeg.c).
> >> >
> >> > For filters there is no such thing yet. New API would have to be
> >> > created. For filters in particular, we will have to make sure that the
> >> > API isn't overly qsv-specific, and that it is extendable to other APIs
> >> > (like for example vaapi or vdpau).  
> >>
> >> I have been looking into a slightly different way: the common object
> >> being transported is an AVFrame. So, my initial approach is to create
> >> an AV_FRAME_DATA_HWACCEL metadata. Lookups could be optimized by
> >> keeping around an AVFrameInternal struct that resides in the same
> >> allocation unit as the AVFrame. But, this is a detail.
> >>
> >> From there, there are at least two usage models, when it comes to filters too:
> >>
> >> 1. Make the AVHWAccelFrame struct hold multiple hwaccel-specific info,
> >> with a master one, and slave ones for various different APIs.
> >> Advantage: a single AVFrame can be used and impersonified whenever
> >> necessary. e.g. a VA surface master could be exported/re-used with an
> >> mfxSurface, a dma_buf (for OpenCL), or userptr buffer. Drawback:
> >> terribly tedious to manage.
> >>
> >> 2. Simpler approach: the AVHWAccelFrame holds a unique struct to
> >> hwaccel specific data. Should we need to export that for use with
> >> another API, it's not too complicated to av_frame_ref() + add new
> >> hwaccel-specific metadata.  
> >
> > It could be something like an API identifier (an enum or so) + API
> > specific pointer to a struct.  
> 
> That's exactly that:
> 
> /**
>  * Hardware Accelerator identifier.
>  *
>  * @note
>  * A hardware accelerator can be device-less. This means that only the
>  * underlying hardware resource, e.g. a Linux dma_buf handle, is being
>  * transported in the AVFrame. That hardware resource could be mapped
>  * through standard OS-dependent calls, e.g. mmap() on Linux.
>  */
> enum AVHWAccelId {
>     AV_HWACCEL_ID_NONE = -1,
>     AV_HWACCEL_ID_VAAPI,
>     AV_HWACCEL_ID_NB,           ///< Not part of ABI
> };
> 
> Name to be improved if people have better suggestions, as this really
> is to be seen as HW resource, not necessarily attached to a particular
> HW device. i.e. this could be a dma_buf handle from a V4L2 buffer or
> VA surface.

OK. (Minor nit: if ID_NONE is valid and means HW API without context,
maybe it should be 0, not -1. Also, if it was meant this way, maybe
these should still have their own ID for other purposes.)

> I am reworking the patch series as I changed my mind again: current
> map strategy was overly complex (and required to be). There were at
> least 4 map flags: AV_FRAME_MAP_{READ,WRITE,READ_FIRST,USWC_MEMORY}. I
> am now preferring a unique av_hwaccel_frame_get_pixels() defined as
> follow:
> 
> /**
>  * Returns AVFrame pixels into linear memory
>  *
>  * This function takes a snapshot of the underlying HW surface and
>  * exposes it to SW backed memory. This may involve a copy from GPU
>  * memory to CPU memory.
>  *
>  * @note
>  * There is no effort underway to commit the modified pixels back to
>  * GPU memory when the \ref dst AVFrame is released.
>  *
>  * @param[in] src       the source frame to read
>  * @param[inout] dst    the target frame to update, or create if NULL
>  * @param[in] flags     an optional combination of AV_FRAME_FLAG_xxx flags
>  * @return 0 on success, an AVERROR code on failure.
>  */
> int
> av_hwaccel_frame_get_pixels(AVFrame *src, AVFrame **dst, unsigned flags);
> 
> i.e. the cost of allocating and copying AVFrame metadata should be
> less than the actual work needed behind the scene. So, it looks like a
> better interim solution for starters.

So this is for read-access only, right? If it can map data, there
also needs to be an unmap function, and the user would have to know
about when to use it.

I don't know if something for write-access is required. It'd be
potentially useful to API users, but they could also be bothered to
implement their own code for this.

I'm not sure how this plays into other uses, such as e.g. mapping a
surface in OpenGL or OpenCL. But it's probably ok.

> >  
> >> For VA-API specific purposes, I then have:
> >> - AVVADisplay, which is refcounted, and that can handle automatic
> >> initialize/terminate calls ;
> >> - AVVAFrameBuffer that holds an { AVVADisplay, VASurfaceID, and
> >> possibly VAImageID } tuple (for mappings in non-USWC memory), and that
> >> populates AVFrame.buf[0].  
> >
> > Sounds good.
> >  
> >> I am undecided yet on whether I'd create an AV_PIX_FMT_HWACCEL format
> >> and allow hwaccel specific AVFrame to absorb an existing AVFrame, as a
> >> transitional method from existing vaapi hwaccel to "new" vaapi
> >> hwaccel. In that new generic hwaccel format model, buf[0] et al. would
> >> be clearly defined, and data[i] not supposed to be touched by user
> >> application. For now, the trend towards that option is low in my mind.  
> >
> > I'd still have different pixfmts for each hwaccel, even if they behave
> > the same. (The main reason being that hw accel init via get_format()
> > requires it.)
> >
> > IMHO, one AVFrame plane pointer should point to your suggested
> > AVHWAccelFrame. This would make more sense with how AVFrame
> > refcounting works in the general case.
> >
> > AVFrame specifically demands that AVFrame.buf[] covers all memory
> > pointed to by AVFrame.data. This doesn't make much sense with the
> > current API, where AVFrame.data[3] is an API specific surface ID
> > (usually an integer casted to a pointer), and the AVBufferRef doesn't
> > really make any sense.  
> 
> Yes, that's also what I wanted to get rid of, at least for VA-API with
> lavc managed objects.
> 
> > With the new suggestion, the AVBufferRef would cover the memory used by
> > AVHWAccelFrame. While not particularly useful, this is consistent, and
> > also frees API users from worrying about what the AVBufferRef data/size
> > fields should be set to.
> >
> > As for compatiblity with old code: maybe AVFrame.data[1] could
> > point to something like this. But I think it's better to make a clean
> > break.  
> 
> When is the next major bump scheduled to happen BTW? :)

At least a year, though we're just about still in the phase where we can
change the ABI freely. API can be deprecated and even disabled (i.e.
the API is still there, but consists of stubs only).

I'm not sure how others think about completely changing this.
Personally I'd be fine with it.

> For compatibility, that's also the idea behind another generic
> AV_PIX_FMT_HWACCEL that would enforce data[i] to be clear of any
> user-supplied pointers, and buf[i] shall be filled in by appropriate
> accessors, or while creating the side-data, e.g.
> av_vaapi_frame_create_side_data(). i.e. when lavc swallows up an
> AVFrame with new-style hwaccel, this is where the AV_PIX_FMT_VAAPI
> would be replaced with AV_PIX_FMT_HWACCEL. Replace "swallows up" with
> e.g. av_vaapi_frame_convert_in_place() if you prefer. Otherwise, IMHO,
> the old-style fields should live untouched, hence the need to keep the
> hwaccel side-data around.

Isn't the problem more about output?

Again, there's the problem with the current hwaccel API selecting the
hwaccel with get_format(), just using the hwaccel-specific pixfmt.

Also, AVFrame.buf[] should cover the memory referenced by
AVFrame.data[]. It's merely a refcount helper for AVFrame.data[], and
should not do additional things.

I think using AVFrame side data for this would be a bit awkward.
Possibly it _could_ be used to store things like VADisplay if we don't
find a better way, but I think having a AVHWAccelFrame would be better.

> >  
> >> PS: other benefit of the AVHWAccelFrame side-data is that I can stuff
> >> crop information into there. Since this is only useful to hwaccel, no
> >> need to populate AVFrame with additional fields IMHO.  
> >
> > IMHO, crop information should be generally available, even for software
> > surfaces. What we currently do are terrible hacks: align the X/Y
> > coordinates to chroma boundaries and adjust the pointer (meaning
> > everyone has to do with possibly unaligned pointers, and non-mod-2
> > crops don't work correctly), and this also means you can have a
> > non-mod-2 width/height, which doesn't really make sense for chroma.  
> 
> I don't really see why this would be needed for SW. AVFrame.buf[] will
> hold the buffers as in "the original allocation". Then AVFrame.data[]
> shall be filled in to fit the actual cropped/decoded region. Isn't it?

Yes, AVFrame.buf[] does this, but you still don't know e.g. the
original width, or even the pointer to a plane's original (0, 0) pixel
if AVFrame.buf[0] covers all planes.

I think doing cropping as metadata would also simplify code a lot. For
example, nobody has to worry about non-mod-2 yuv420 anymore, and how it
should be handled. It's less tricky, more correct, more efficient.