[FFmpeg-devel] [RFC] amfenc: Add support for OpenCL input
Mark Thompson
sw at jkqxz.net
Tue Jan 23 17:41:03 EET 2018
On 23/01/18 15:14, Mironov, Mikhail wrote:
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
>> Of Mironov, Mikhail
>> Sent: January 23, 2018 10:04 AM
>> To: FFmpeg development discussions and patches <ffmpeg-
>> devel at ffmpeg.org>
>> Subject: Re: [FFmpeg-devel] [RFC] amfenc: Add support for OpenCL input
>>
>>> -----Original Message-----
>>> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf
>>> Of Mark Thompson
>>> Sent: January 22, 2018 6:57 PM
>>> To: FFmpeg development discussions and patches <ffmpeg-
>>> devel at ffmpeg.org>
>>> Subject: [FFmpeg-devel] [RFC] amfenc: Add support for OpenCL input
>>>
>>> ---
>>> This allows passing OpenCL frames to AMF without a download/upload
>>> step to get around AMD's lack of support for D3D11 mapping.
>>>
>>> For example:
>>>
>>> ./ffmpeg -hwaccel dxva2 -hwaccel_output_format dxva2_vld -i input.mp4
>>> -an -vf
>>>
>> 'hwmap=derive_device=opencl,program_opencl=source=examples.cl:kernel=
>>> rotate_image' -c:v h264_amf output.mp4
>>>
>>> * I can't find any documentation or examples for these functions, so
>>> I'm guessing a bit exactly how they are meant to work. In particular,
>>> there are some locking functions which I have ignored because I have
>>> no idea under what circumstances something might want to be locked.
>>> * I tried to write common parts with D3D11, but I might well have
>>> broken
>>> D3D11 support in the process - it doesn't work at all for me so I can't test it.
>>> * Not sure how to get non-NV12 to work. I may be missing something,
>>> or it may just not be there - the trace messages suggest it doesn't
>>> like the width of
>>> RGB0 or the second plane of GRAY8.
>>>
>>> - Mark
>>>
>>>
>>> libavcodec/amfenc.c | 178
>>> +++++++++++++++++++++++++++++++++++---------
>>> --------
>>> libavcodec/amfenc.h | 1 +
>>> 2 files changed, 123 insertions(+), 56 deletions(-)
>>>
>>> diff --git a/libavcodec/amfenc.c b/libavcodec/amfenc.c index
>>> 89a10ff253..220cdd278f 100644
>>> --- a/libavcodec/amfenc.c
>>> +++ b/libavcodec/amfenc.c
>>> @@ -24,6 +24,9 @@
>>> #if CONFIG_D3D11VA
>>> #include "libavutil/hwcontext_d3d11va.h"
>>> #endif
>>> +#if CONFIG_OPENCL
>>> +#include "libavutil/hwcontext_opencl.h"
>>> +#endif
>>> #include "libavutil/mem.h"
>>> #include "libavutil/pixdesc.h"
>>> #include "libavutil/time.h"
>>> @@ -51,6 +54,9 @@ const enum AVPixelFormat ff_amf_pix_fmts[] = { #if
>>> CONFIG_D3D11VA
>>> AV_PIX_FMT_D3D11,
>>> #endif
>>> +#if CONFIG_OPENCL
>>> + AV_PIX_FMT_OPENCL,
>>> +#endif
>>> AV_PIX_FMT_NONE
>>> };
>>>
>>> @@ -69,6 +75,7 @@ static const FormatMap format_map[] =
>>> { AV_PIX_FMT_YUV420P, AMF_SURFACE_YUV420P },
>>> { AV_PIX_FMT_YUYV422, AMF_SURFACE_YUY2 },
>>> { AV_PIX_FMT_D3D11, AMF_SURFACE_NV12 },
>>> + { AV_PIX_FMT_OPENCL, AMF_SURFACE_NV12 },
>>> };
>>>
>>>
>>> @@ -154,8 +161,9 @@ static int amf_load_library(AVCodecContext *avctx)
>>>
>>> static int amf_init_context(AVCodecContext *avctx) {
>>> - AmfContext *ctx = avctx->priv_data;
>>> - AMF_RESULT res = AMF_OK;
>>> + AmfContext *ctx = avctx->priv_data;
>>> + AMF_RESULT res;
>>> + AVHWDeviceContext *hwdev = NULL;
>>>
>>> // configure AMF logger
>>> // the return of these functions indicates old state and do not
>>> affect behaviour @@ -173,59 +181,91 @@ static int
>>> amf_init_context(AVCodecContext *avctx)
>>>
>>> res = ctx->factory->pVtbl->CreateContext(ctx->factory, &ctx->context);
>>> AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN,
>>> "CreateContext() failed with error %d\n", res);
>>> - // try to reuse existing DX device
>>> -#if CONFIG_D3D11VA
>>> +
>>> + // Attempt to initialise from an existing D3D11 or OpenCL device.
>>> if (avctx->hw_frames_ctx) {
>>> - AVHWFramesContext *device_ctx = (AVHWFramesContext*)avctx-
>>>> hw_frames_ctx->data;
>>> - if (device_ctx->device_ctx->type == AV_HWDEVICE_TYPE_D3D11VA) {
>>> - if (amf_av_to_amf_format(device_ctx->sw_format) !=
>>> AMF_SURFACE_UNKNOWN) {
>>> - if (device_ctx->device_ctx->hwctx) {
>>> - AVD3D11VADeviceContext *device_d3d11 =
>>> (AVD3D11VADeviceContext *)device_ctx->device_ctx->hwctx;
>>> - res = ctx->context->pVtbl->InitDX11(ctx->context,
>> device_d3d11-
>>>> device, AMF_DX11_1);
>>> - if (res == AMF_OK) {
>>> - ctx->hw_frames_ctx = av_buffer_ref(avctx->hw_frames_ctx);
>>> - if (!ctx->hw_frames_ctx) {
>>> - return AVERROR(ENOMEM);
>>> - }
>>> - } else {
>>> - if(res == AMF_NOT_SUPPORTED)
>>> - av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has
>>> D3D11 device which doesn't have D3D11VA interface, switching to
>>> default\n");
>>> - else
>>> - av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has
>>> non-AMD device, switching to default\n");
>>> - }
>>> - }
>>> - } else {
>>> - av_log(avctx, AV_LOG_INFO, "avctx->hw_frames_ctx has format
>>> not uspported by AMF, switching to default\n");
>>> - }
>>> + AVHWFramesContext *hwfc =
>>> + (AVHWFramesContext*)avctx->hw_frames_ctx->data;
>>> +
>>> + if (amf_av_to_amf_format(hwfc->sw_format) ==
>>> AMF_SURFACE_UNKNOWN) {
>>> + av_log(avctx, AV_LOG_VERBOSE, "Input hardware frame
>>> + format (%s)
>>> is not supported.\n",
>>> + av_get_pix_fmt_name(hwfc->sw_format));
>>> + } else {
>>> + hwdev = hwfc->device_ctx;
>>> +
>>> + ctx->hw_frames_ctx = av_buffer_ref(avctx->hw_frames_ctx);
>>> + if (!ctx->hw_frames_ctx)
>>> + return AVERROR(ENOMEM);
>>> }
>>> - } else if (avctx->hw_device_ctx) {
>>> - AVHWDeviceContext *device_ctx = (AVHWDeviceContext*)(avctx-
>>>> hw_device_ctx->data);
>>> - if (device_ctx->type == AV_HWDEVICE_TYPE_D3D11VA) {
>>> - if (device_ctx->hwctx) {
>>> - AVD3D11VADeviceContext *device_d3d11 =
>>> (AVD3D11VADeviceContext *)device_ctx->hwctx;
>>> - res = ctx->context->pVtbl->InitDX11(ctx->context, device_d3d11-
>>>> device, AMF_DX11_1);
>>> + }
>>> + if (!hwdev && avctx->hw_device_ctx) {
>>> + hwdev = (AVHWDeviceContext*)avctx->hw_device_ctx->data;
>>> +
>>> + ctx->hw_device_ctx = av_buffer_ref(avctx->hw_device_ctx);
>>> + if (!ctx->hw_device_ctx)
>>> + return AVERROR(ENOMEM);
>>> + }
>>> + if (hwdev) {
>>> +#if CONFIG_D3D11VA
>>> + if (hwdev->type == AV_HWDEVICE_TYPE_D3D11VA) {
>>> + AVD3D11VADeviceContext *d3d11dev = hwdev->hwctx;
>>> +
>>> + res = ctx->context->pVtbl->InitDX11(ctx->context,
>>> + d3d11dev->device, AMF_DX11_1);
>>> + if (res == AMF_OK) {
>>> + av_log(avctx, AV_LOG_VERBOSE, "Initialised from "
>>> + "external D3D11 device.\n");
>>> + return 0;
>>> + }
>>> +
>>> + av_log(avctx, AV_LOG_INFO, "Failed to initialise from "
>>> + "external D3D11 device: %d.\n", res);
>>> + } else
>>> +#endif
>>> +#if CONFIG_OPENCL
>>> + if (hwdev->type == AV_HWDEVICE_TYPE_OPENCL) {
>>> + AVOpenCLDeviceContext *cldev = hwdev->hwctx;
>>> + cl_int cle;
>>> +
>>> + ctx->cl_command_queue =
>>> + clCreateCommandQueue(cldev->context,
>>> + cldev->device_id, 0,
>>> &cle);
>>> + if (!ctx->cl_command_queue) {
>>> + av_log(avctx, AV_LOG_INFO, "Failed to create OpenCL "
>>> + "command queue: %d.\n", cle);
>>> + } else {
>>> + res = ctx->context->pVtbl->InitOpenCL(ctx->context,
>>> +
>>> + ctx->cl_command_queue);
>>> if (res == AMF_OK) {
>>> - ctx->hw_device_ctx = av_buffer_ref(avctx->hw_device_ctx);
>>> - if (!ctx->hw_device_ctx) {
>>> - return AVERROR(ENOMEM);
>>> - }
>>> - } else {
>>> - if (res == AMF_NOT_SUPPORTED)
>>> - av_log(avctx, AV_LOG_INFO, "avctx->hw_device_ctx has
>> D3D11
>>> device which doesn't have D3D11VA interface, switching to default\n");
>>> - else
>>> - av_log(avctx, AV_LOG_INFO, "avctx->hw_device_ctx has non-
>>> AMD device, switching to default\n");
>>> + av_log(avctx, AV_LOG_VERBOSE, "Initialised from "
>>> + "external OpenCL device.\n");
>>> + return 0;
>>> }
>>> + av_log(avctx, AV_LOG_INFO, "Failed to initialise from "
>>> + "external OpenCL device: %d.\n", res);
>>> }
>>> + } else
>>> +#endif
>>> + {
>>> + av_log(avctx, AV_LOG_INFO, "Input device type %s is not
>>> supported.\n",
>>> + av_hwdevice_get_type_name(hwdev->type));
>>> }
>>> }
>>> -#endif
>>> - if (!ctx->hw_frames_ctx && !ctx->hw_device_ctx) {
>>> - res = ctx->context->pVtbl->InitDX11(ctx->context, NULL,
>> AMF_DX11_1);
>>> - if (res != AMF_OK) {
>>> - res = ctx->context->pVtbl->InitDX9(ctx->context, NULL);
>>> - AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_UNKNOWN,
>>> "InitDX9() failed with error %d\n", res);
>>> +
>>> + // Initialise from a new D3D11 device, or D3D9 if D3D11 is not available.
>>> + res = ctx->context->pVtbl->InitDX11(ctx->context, NULL, AMF_DX11_1);
>>> + if (res == AMF_OK) {
>>> + av_log(avctx, AV_LOG_VERBOSE, "Initialised from internal
>>> + D3D11
>>> device.\n");
>>> + } else {
>>> + av_log(avctx, AV_LOG_VERBOSE, "Failed to initialise from
>>> + internal
>>> D3D11 device: %d.\n", res);
>>> + res = ctx->context->pVtbl->InitDX9(ctx->context, NULL);
>>> + if (res == AMF_OK) {
>>> + av_log(avctx, AV_LOG_VERBOSE, "Initialised from internal
>>> + D3D9
>>> device.\n");
>>> + } else {
>>> + av_log(avctx, AV_LOG_VERBOSE, "Failed to initialise from
>>> + internal
>>> D3D9 device: %d.\n", res);
>>> + av_log(avctx, AV_LOG_ERROR, "Unable to initialise AMF.\n");
>>> + return AVERROR_UNKNOWN;
>>> }
>>> }
>>> +
>>> return 0;
>>> }
>>>
>>> @@ -279,6 +319,11 @@ int av_cold ff_amf_encode_close(AVCodecContext
>>> *avctx)
>>> av_buffer_unref(&ctx->hw_device_ctx);
>>> av_buffer_unref(&ctx->hw_frames_ctx);
>>>
>>> +#if CONFIG_OPENCL
>>> + if (ctx->cl_command_queue)
>>> + clReleaseCommandQueue(ctx->cl_command_queue);
>>> +#endif
>>> +
>>> if (ctx->trace) {
>>> ctx->trace->pVtbl->UnregisterWriter(ctx->trace,
>>> FFMPEG_AMF_WRITER_ID);
>>> }
>>> @@ -485,17 +530,38 @@ int ff_amf_send_frame(AVCodecContext *avctx,
>>> const AVFrame *frame)
>>> (AVHWDeviceContext*)ctx->hw_device_ctx->data)
>>> )) {
>>> #if CONFIG_D3D11VA
>>> - static const GUID AMFTextureArrayIndexGUID = { 0x28115527,
>>> 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf } };
>>> - ID3D11Texture2D *texture = (ID3D11Texture2D*)frame->data[0]; //
>>> actual texture
>>> - int index = (int)(size_t)frame->data[1]; // index is a slice in texture
>>> array is - set to tell AMF which slice to use
>>> - texture->lpVtbl->SetPrivateData(texture,
>>> &AMFTextureArrayIndexGUID, sizeof(index), &index);
>>> -
>>> - res = ctx->context->pVtbl->CreateSurfaceFromDX11Native(ctx-
>>>> context, texture, &surface, NULL); // wrap to AMF surface
>>> - AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM),
>>> "CreateSurfaceFromDX11Native() failed with error %d\n", res);
>>> -
>>> - // input HW surfaces can be vertically aligned by 16; tell AMF the
>> real
>>> size
>>> - surface->pVtbl->SetCrop(surface, 0, 0, frame->width, frame-
>>> height);
>>> + if (frame->format == AV_PIX_FMT_D3D11) {
>>> + static const GUID AMFTextureArrayIndexGUID = {
>>> + 0x28115527,
>>> 0xe7c3, 0x4b66, { 0x99, 0xd3, 0x4f, 0x2a, 0xe6, 0xb4, 0x7f, 0xaf } };
>>> + ID3D11Texture2D *texture =
>>> + (ID3D11Texture2D*)frame->data[0];
>>> // actual texture
>>> + int index = (int)(size_t)frame->data[1]; // index is
>>> + a slice in texture
>>> array is - set to tell AMF which slice to use
>>> + texture->lpVtbl->SetPrivateData(texture,
>>> + &AMFTextureArrayIndexGUID, sizeof(index), &index);
>>> +
>>> + res =
>>> + ctx->context->pVtbl->CreateSurfaceFromDX11Native(ctx-
>>>> context, texture, &surface, NULL); // wrap to AMF surface
>>> + AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
>>> + AVERROR(ENOMEM), "CreateSurfaceFromDX11Native() failed with error
>>> + %d\n", res);
>>> +
>>> + // input HW surfaces can be vertically aligned by 16;
>>> + tell AMF the
>>> real size
>>> + surface->pVtbl->SetCrop(surface, 0, 0, frame->width,
>>> + frame-
>>>> height);
>>> + } else
>>> +#endif
>>> +#if CONFIG_OPENCL
>>> + if (frame->format == AV_PIX_FMT_OPENCL) {
>>> + void *planes[AV_NUM_DATA_POINTERS];
>>> + AMF_SURFACE_FORMAT format;
>>> + int i;
>>> +
>>> + for (i = 0; i < AV_NUM_DATA_POINTERS; i++)
>>> + planes[i] = frame->data[i];
>>> +
>>> + format = amf_av_to_amf_format(frame->format);
>>> +
>>> + res =
>>> + ctx->context->pVtbl->CreateSurfaceFromOpenCLNative(ctx-
>>>> context, format,
>>> + frame->width, frame->height,
>>> + planes, &surface, NULL);
>>> + AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
>>> AVERROR_UNKNOWN,
>>> + "CreateSurfaceFromOpenCLNative()
>>> + failed with error
>>> %d\n", res);
>>> + } else
>>> #endif
>>> + av_assert0(0 && "Invalid hardware input format.");
>>> } else {
>>> res = ctx->context->pVtbl->AllocSurface(ctx->context,
>>> AMF_MEMORY_HOST, ctx->format, avctx->width, avctx->height, &surface);
>>> AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR(ENOMEM),
>>> "AllocSurface() failed with error %d\n", res); diff --git
>>> a/libavcodec/amfenc.h b/libavcodec/amfenc.h index
>>> 84f0aad2fa..bb8fd1807a 100644
>>> --- a/libavcodec/amfenc.h
>>> +++ b/libavcodec/amfenc.h
>>> @@ -61,6 +61,7 @@ typedef struct AmfContext {
>>>
>>> AVBufferRef *hw_device_ctx; ///< pointer to HW accelerator
>>> (decoder)
>>> AVBufferRef *hw_frames_ctx; ///< pointer to HW accelerator (frame
>>> allocator)
>>> + void *cl_command_queue; ///< Command queue for use with
>>> OpenCL input
>>>
>>> // helpers to handle async calls
>>> int delayed_drain;
>>> --
>>> 2.11.0
>>> _______________________________________________
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel at ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> AMF encoder works via D3D9 or D3D11 only. AMF OpenCL support is done
>> for possible integration with external image processing. Passing regular
>> OpenCL 2D images will cause mapping to system memory and copy.
>> The fast way is to use interop:
>> - Allocate last processing NV12 surface as D3D11 texture
>> - iterop it into OpenCL
>> - use as output for the last OCL kernel
>> - un-interop back to D3D11
>> - submit to AMF.
>> There is not much value to initialize AMF with OpenCL unless AMF color
>> space converter is used.
>> The converter would do the sequence described above.
>>
>> If AMF CSC is used few things has to be done:
>> 1. Device should be created by passing D3D11 device as a parameter. It is
>> done in hwcontext_opencl.c clGetDeviceIDsFromD3D11KNR().
>> 2. The D3D11 device used there should be passed to AMF via InitDX11()
>> preferably before InitOpenCL() call.
>> 3. Add RGB formats for submission.
>> Mikhail
>>
>
> Alternatively we could just allocate D3D11 surface, interop to OCL, copy using OCL, un-interop, and submit to AMF:
> Context->InitD3D11(device used for OCL device creation)
> Context->InitOpenCL(queue)
> Context->AllocSurface(AMF_MEMORY_D3D11,AMF_SURFACE_NV12,, &surface);
> surface->Convert(AMF_MEMORY_OPENCL); //interop
> cl_mem planeY = surface->GetPlaneAt(0)->GetNative();
> cl_mem planeUV = surface->GetPlaneAt(1)->GetNative();
>
> clEnqueueCopyImage() // Y
> clEnqueueCopyImage() // UV
> surface->Convert(AMF_MEMORY_D3D11); //un-interop
> encoder->SubmitInput(surface);
Right, that sequence would work; I might try it with D3D9.
Is there a reason why the driver doesn't use this path (or some equivalent) internally? Implementing the download/upload sequence inside the driver feels just as bad, and is significantly more misleading to the user. (I assume the reason why the OpenCL images aren't usable directly is due a restriction on tiling modes or some similar layout issue, so at least one copy is definitely required.)
- Mark
More information about the ffmpeg-devel
mailing list