[FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer copies are done before submitting them
Soft Works
softworkz at hotmail.com
Sat Aug 8 00:59:27 EEST 2020
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Steve Lhomme
> Sent: Friday, August 7, 2020 3:05 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer
> copies are done before submitting them
>
> I experimented a bit more with this. Here are the 3 scenarii in other of least
> frame late:
>
> - GetData waiting for 1/2s and releasing the lock
> - No use of GetData (current code)
> - GetData waiting for 1/2s and keeping the lock
>
> The last option has horrible perfomance issues and should not be used.
>
> The first option gives about 50% less late frames compared to the current
> code. *But* it requires to unlock the Video Context. There are 2 problems
> with this:
>
> - the same ID3D11Asynchronous is used to wait on multiple concurrent
> thread. This can confuse D3D11 which emits a warning in the logs.
> - another thread might Get/Release some buffers and submit them before
> this thread is finished processing. That can result in distortions, for example if
> the second thread/frame depends on the first thread/frame which is not
> submitted yet.
>
> The former issue can be solved by using a ID3D11Asynchronous per thread.
> That requires some TLS storage which FFmpeg doesn't seem to support yet.
> With this I get virtually no frame late.
>
> The latter issue only occur if the wait is too long. For example waiting by
> increments of 10ms is too long in my test. Using increments of 1ms or 2ms
> works fine in the most stressing sample I have (Sony Camping HDR HEVC high
> bitrate). But this seems hackish. There's still potentially a quick frame (alt
> frame in VPx/AV1 for example) that might get through to the decoder too
> early. (I suppose that's the source of the distortions I
> see)
>
> It's also possible to change the order of the buffer sending, by starting with
> the bigger one (D3D11_VIDEO_DECODER_BUFFER_BITSTREAM). But it seems
> to have little influence, regardless if we wait for buffer submission or not.
>
> The results are consistent between integrated GPU and dedicated GPU.
Hi Steven,
A while ago I had extended D3D11VA implementation to support single
(non-array textures) for interoperability with Intel QSV+DX11.
I noticed a few bottlenecks making D3D11VA significantly slower than DXVA2.
The solution was to use ID3D10Multithread_SetMultithreadProtected and
remove all the locks which are currently applied.
Hence, I don't think that your patch is the best possible way .
Regards,
softworkz
More information about the ffmpeg-devel
mailing list