[FFmpeg-devel] [FFmpeg-cvslog] lavu/qsv: make a copy as libmfx alignment requirement for uploading

Wed Oct 24 23:45:51 EEST 2018

On Tue, 2018-10-23 at 23:46 +0100, Mark Thompson wrote:
> On 23/10/18 08:49, Li, Zhong wrote:
> > > From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On
> > > Behalf
> > > Of Mark Thompson
> > > Sent: Sunday, October 14, 2018 12:36 AM
> > > To: ffmpeg-devel at ffmpeg.org
> > > Subject: Re: [FFmpeg-devel] [FFmpeg-cvslog] lavu/qsv: make a copy
> > > as
> > > libmfx alignment requirement for uploading
> > > 
> > > On 11/10/18 06:38, Zhong Li wrote:
> > > > ffmpeg | branch: master | Zhong Li <zhong.li at intel.com> | Mon
> > > > Sep 17
> > > > 19:16:44 2018 +0800| [681aa7d14f97fd98181ca6d61e11be48fe65692d]
> > > > |
> > > > committer: Zhong Li
> > > > 
> > > > lavu/qsv: make a copy as libmfx alignment requirement for
> > > > uploading
> > > > 
> > > > Libmfx requires 16 bytes aligned input/output for uploading.
> > > > Currently only output is 16 byte aligned and assigning same
> > > > width/height to input with smaller buffer size actually, thus
> > > > definitely will
> > > 
> > > cause segment fault.
> > > > 
> > > > Can reproduce with any 1080p nv12 rawvideo input:
> > > > ffmpeg -init_hw_device qsv=qsv:hw -hwaccel qsv
> > > > -filter_hw_device qsv
> > > > -f rawvideo -pix_fmt nv12 -s:v 1920x1080 -i 1080p_nv12.yuv -vf
> > > > 'format=nv12,hwupload=extra_hw_frames=16,hwdownload,format=nv12
> > > > '
> > > 
> > > -an
> > > > -y out_nv12.yuv
> > > > 
> > > > It can fix #7418
> > > > 
> > > > Signed-off-by: Zhong Li <zhong.li at intel.com>
> > > > 
> > > > > 
> > > 
> > > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=681aa7d
> > > 14f9
> > > > > 7fd98181ca6d61e11be48fe65692d
> > > > 
> > > > ---
> > > > 
> > > >  libavutil/hwcontext_qsv.c | 31 +++++++++++++++++++++++++++++--
> > > >  1 file changed, 29 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/libavutil/hwcontext_qsv.c
> > > > b/libavutil/hwcontext_qsv.c
> > > > index 33e121b416..814ce215ce 100644
> > > > --- a/libavutil/hwcontext_qsv.c
> > > > +++ b/libavutil/hwcontext_qsv.c
> > > > @@ -862,6 +862,10 @@ static int
> > > 
> > > qsv_transfer_data_to(AVHWFramesContext *ctx, AVFrame *dst,
> > > >      mfxSyncPoint sync = NULL;
> > > >      mfxStatus err;
> > > >      int ret = 0;
> > > > +    /* make a copy if the input is not padded as libmfx
> > > > requires */
> > > > +    AVFrame tmp_frame, *src_frame;
> > > > +    int realigned = 0;
> > > > +
> > > > 
> > > >      while (!s->session_upload_init && !s->session_upload &&
> > > > !ret) {
> > > > #if HAVE_PTHREADS @@ -887,16 +891,36 @@ static int
> > > > qsv_transfer_data_to(AVHWFramesContext *ctx, AVFrame *dst,
> > > >      if (ret < 0)
> > > >          return ret;
> > > > 
> > > > +
> > > > +    if (src->height & 16 || src->linesize[0] & 16) {
> > > 
> > > Doesn't this still need to check that the layout is compatible
> > > with other the
> > > limitations that libmfx has before trying to do the upload?  In
> > > particular,
> > > that the pitches for the different planes are actually all the
> > > same - I certainly
> > > wouldn't expect them to be for YUV420P.
> > 
> > Do you mean our previous discussion on https://lists.libav.org/pipe
> > rmail/libav-devel/2018-April/086070.html ?
> > If yes, I will try to reproduce it with hwupload pipeline instead
> > of transcoding pipeline, and then give an update patch.
> 
> Yes.  You might need to write a small test program to generation an
> arbitrary allocation pattern, since libavutil only has a fixed
> layout.
> 
> > > 
> > > (IIRC there was a requirement 0 < frame->data[N] - frame->data[0] 
> > > < 2^24
> > > as well (or something like that, I might be wrong), which also
> > > need not be
> > > true.)
> > 
> > I don't know such a requirement, could it be found from MSDK
> > guide? 
> 
> That came from discussion of hardware capabilities around VAAPI, I
> think.

There are upper size frame limitations which HW can support. And this
limitation can be different for different operations, for example AVC
encoder supports 4K as a maximum. Jpeg can support 16K. This depend on
the component and may depend on the generation of hardware. In any case
I don't think any of these kind of limitations are something to be
checked within ffmpeg during allocation. That's responsibility of
underlying implementation via which you try to allocate to return an
error to ffmpeg that allocation can not be done. Or responsibility of
the component (msdk or vaapi) to return an error that processing of the
given surface can not be done (due to per-component limitations).

> 
> Won't something like this be the reason for wanting a virtually
> contiguous buffer?  If there isn't a restriction along these lines
> then there is no need to make the buffer virtually
> contiguous.  (After all, the hardware can't know about C undefined
> behaviour in pointer comparisons...)

"Virtually contiguous" sounds dangerous to me. There is a discrepancy
in treating layouts of color formats in ffmpeg and other libraries (and
drivers). First of all I did not see clear document describing what
ffmpeg thinks layout should be. What it seems to me is that ffmpeg
treats color formats per-plane meaning that in NV12 Y and UV could be
allocated absolutely separately. While you work with the SW components
this might be ok. But HW defines pretty specifically what it expects.
And typically memory layout is supposed to be contiguous, I mean
_physically_ contiguous. Secondly, HW may have some other implications
over layout requiring particular alignments. These could be width
alignments, i.e. pitches. Or height padding alignments. As of now it
seems that ffmpeg don't provide an architecture to satisfy all these
needs. This leads to overheads in terms of memory consumption since
additional "rightly allocated" memory is required to copy frames and
eventually this implies processing overheads as well.

Here is an example from gstreamer where they describe what they mean
under color formats: https://gstreamer.freedesktop.org/documentation/de
sign/mediatype-video-raw.html. This would be wonderful if ffmpeg could
define something similar. Pay attention that in gstreamer you clearly
get understanding what is "default" NV12 memory layout is: it is
contiguous memory where UV plane is located right after the end of the
Y plane, no additional padding for width and height. 

I have tried to find something similar for ffmpeg, but failed (except
the ffmpeg source code of course).

> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel