[FFmpeg-devel] [PATCH v3 1/2] dxva: wait until D3D11 buffer copies are done before submitting them
Steve Lhomme
robux4 at ycbcr.xyz
Fri Aug 7 16:05:26 EEST 2020
I experimented a bit more with this. Here are the 3 scenarii in other of
least frame late:
- GetData waiting for 1/2s and releasing the lock
- No use of GetData (current code)
- GetData waiting for 1/2s and keeping the lock
The last option has horrible perfomance issues and should not be used.
The first option gives about 50% less late frames compared to the
current code. *But* it requires to unlock the Video Context. There are 2
problems with this:
- the same ID3D11Asynchronous is used to wait on multiple concurrent
thread. This can confuse D3D11 which emits a warning in the logs.
- another thread might Get/Release some buffers and submit them before
this thread is finished processing. That can result in distortions, for
example if the second thread/frame depends on the first thread/frame
which is not submitted yet.
The former issue can be solved by using a ID3D11Asynchronous per thread.
That requires some TLS storage which FFmpeg doesn't seem to support yet.
With this I get virtually no frame late.
The latter issue only occur if the wait is too long. For example waiting
by increments of 10ms is too long in my test. Using increments of 1ms or
2ms works fine in the most stressing sample I have (Sony Camping HDR
HEVC high bitrate). But this seems hackish. There's still potentially a
quick frame (alt frame in VPx/AV1 for example) that might get through to
the decoder too early. (I suppose that's the source of the distortions I
see)
It's also possible to change the order of the buffer sending, by
starting with the bigger one (D3D11_VIDEO_DECODER_BUFFER_BITSTREAM). But
it seems to have little influence, regardless if we wait for buffer
submission or not.
The results are consistent between integrated GPU and dedicated GPU.
On 2020-08-05 12:07, Steve Lhomme wrote:
> When used aggressively, calling SubmitDecoderBuffers() just after
> ReleaseDecoderBuffer() may have the buffers not used properly and creates
> decoding artifacts.
> It's likely due to the time to copy the submitted buffer in CPU mapped memory
> to GPU memory. SubmitDecoderBuffers() doesn't appear to wait for the state
> of the buffer submitted to become "ready".
>
> For now it's not supported in the legacy API using AVD3D11VAContext, we need to
> add a ID3D11DeviceContext in there as it cannot be derived from the other
> interfaces we provide (ID3D11VideoContext is not a kind of ID3D11DeviceContext).
> ---
> libavcodec/dxva2.c | 33 +++++++++++++++++++++++++++++++++
> libavcodec/dxva2_internal.h | 2 ++
> 2 files changed, 35 insertions(+)
>
> diff --git a/libavcodec/dxva2.c b/libavcodec/dxva2.c
> index 32416112bf..1a0e5b69b2 100644
> --- a/libavcodec/dxva2.c
> +++ b/libavcodec/dxva2.c
> @@ -692,6 +692,12 @@ int ff_dxva2_decode_init(AVCodecContext *avctx)
> d3d11_ctx->surface = sctx->d3d11_views;
> d3d11_ctx->workaround = sctx->workaround;
> d3d11_ctx->context_mutex = INVALID_HANDLE_VALUE;
> +
> + D3D11_QUERY_DESC query = { 0 };
> + query.Query = D3D11_QUERY_EVENT;
> + if (FAILED(ID3D11Device_CreateQuery(device_hwctx->device, &query,
> + (ID3D11Query**)&sctx->wait_copies)))
> + sctx->wait_copies = NULL;
> }
> #endif
>
> @@ -729,6 +735,8 @@ int ff_dxva2_decode_uninit(AVCodecContext *avctx)
> av_buffer_unref(&sctx->decoder_ref);
>
> #if CONFIG_D3D11VA
> + if (sctx->wait_copies)
> + ID3D11Asynchronous_Release(sctx->wait_copies);
> for (i = 0; i < sctx->nb_d3d11_views; i++) {
> if (sctx->d3d11_views[i])
> ID3D11VideoDecoderOutputView_Release(sctx->d3d11_views[i]);
> @@ -932,6 +940,12 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame,
>
> #if CONFIG_D3D11VA
> if (ff_dxva2_is_d3d11(avctx)) {
> + if (sctx->wait_copies) {
> + AVHWFramesContext *frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data;
> + AVD3D11VADeviceContext *device_hwctx = frames_ctx->device_ctx->hwctx;
> + ID3D11DeviceContext_Begin(device_hwctx->device_context, sctx->wait_copies);
> + }
> +
> buffer = &buffer11[buffer_count];
> type = D3D11_VIDEO_DECODER_BUFFER_PICTURE_PARAMETERS;
> }
> @@ -1005,9 +1019,28 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame,
>
> #if CONFIG_D3D11VA
> if (ff_dxva2_is_d3d11(avctx))
> + {
> + int maxWait = 10;
> + /* wait until all the buffer release is done copying data to the GPU
> + * before doing the submit command */
> + if (sctx->wait_copies) {
> + AVHWFramesContext *frames_ctx = (AVHWFramesContext*)avctx->hw_frames_ctx->data;
> + AVD3D11VADeviceContext *device_hwctx = frames_ctx->device_ctx->hwctx;
> + ID3D11DeviceContext_End(device_hwctx->device_context, sctx->wait_copies);
> +
> + while (maxWait-- && S_FALSE ==
> + ID3D11DeviceContext_GetData(device_hwctx->device_context,
> + sctx->wait_copies, NULL, 0, 0)) {
> + ff_dxva2_unlock(avctx);
> + SleepEx(2, TRUE);
> + ff_dxva2_lock(avctx);
> + }
> + }
> +
> hr = ID3D11VideoContext_SubmitDecoderBuffers(D3D11VA_CONTEXT(ctx)->video_context,
> D3D11VA_CONTEXT(ctx)->decoder,
> buffer_count, buffer11);
> + }
> #endif
> #if CONFIG_DXVA2
> if (avctx->pix_fmt == AV_PIX_FMT_DXVA2_VLD) {
> diff --git a/libavcodec/dxva2_internal.h b/libavcodec/dxva2_internal.h
> index b822af59cd..c44e8e09b0 100644
> --- a/libavcodec/dxva2_internal.h
> +++ b/libavcodec/dxva2_internal.h
> @@ -81,6 +81,8 @@ typedef struct FFDXVASharedContext {
> ID3D11VideoDecoderOutputView **d3d11_views;
> int nb_d3d11_views;
> ID3D11Texture2D *d3d11_texture;
> +
> + ID3D11Asynchronous *wait_copies;
> #endif
>
> #if CONFIG_DXVA2
> --
> 2.26.2
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
More information about the ffmpeg-devel
mailing list