[FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer copies are done before submitting them

Soft Works softworkz at hotmail.com
Sat Aug 8 01:05:45 EEST 2020



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Soft Works
> Sent: Friday, August 7, 2020 11:59 PM
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer
> copies are done before submitting them
> 
> > -----Original Message-----
> > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> > Steve Lhomme
> > Sent: Friday, August 7, 2020 3:05 PM
> > To: ffmpeg-devel at ffmpeg.org
> > Subject: Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11
> > buffer copies are done before submitting them
> >
> > I experimented a bit more with this. Here are the 3 scenarii in other
> > of least frame late:
> >
> > - GetData waiting for 1/2s and releasing the lock
> > - No use of GetData (current code)
> > - GetData waiting for 1/2s and keeping the lock
> >
> > The last option has horrible perfomance issues and should not be used.
> >
> > The first option gives about 50% less late frames compared to the
> > current code. *But* it requires to unlock the Video Context. There are
> > 2 problems with this:
> >
> > - the same ID3D11Asynchronous is used to wait on multiple concurrent
> > thread. This can confuse D3D11 which emits a warning in the logs.
> > - another thread might Get/Release some buffers and submit them before
> > this thread is finished processing. That can result in distortions,
> > for example if the second thread/frame depends on the first
> > thread/frame which is not submitted yet.
> >
> > The former issue can be solved by using a ID3D11Asynchronous per thread.
> > That requires some TLS storage which FFmpeg doesn't seem to support
> yet.
> > With this I get virtually no frame late.
> >
> > The latter issue only occur if the wait is too long. For example
> > waiting by increments of 10ms is too long in my test. Using increments
> > of 1ms or 2ms works fine in the most stressing sample I have (Sony
> > Camping HDR HEVC high bitrate). But this seems hackish. There's still
> > potentially a quick frame (alt frame in VPx/AV1 for example) that
> > might get through to the decoder too early. (I suppose that's the
> > source of the distortions I
> > see)
> >
> > It's also possible to change the order of the buffer sending, by
> > starting with the bigger one
> (D3D11_VIDEO_DECODER_BUFFER_BITSTREAM).
> > But it seems to have little influence, regardless if we wait for buffer
> submission or not.
> >
> > The results are consistent between integrated GPU and dedicated GPU.
> 
> Hi Steven,
> 
> A while ago I had extended D3D11VA implementation to support single (non-
> array textures) for interoperability with Intel QSV+DX11.
> 
> I noticed a few bottlenecks making D3D11VA significantly slower than DXVA2.
> 
> The solution was to use ID3D10Multithread_SetMultithreadProtected and
> remove all the locks which are currently applied.
> 
> Hence, I don't think that your patch is the best possible way .
> 
> Regards,
> softworkz

I almost forgot that I had published that change already: https://github.com/softworkz/ffmpeg_dx11/commit/c09cc37ce7f513717493e060df740aa0e7374257




More information about the ffmpeg-devel mailing list