[FFmpeg-devel] [PATCH V2] Add a filter implementing HDR image reconstruction from a single exposure using deep CNNs
Pedro Arthur
bygrandao at gmail.com
Wed Oct 17 21:14:39 EEST 2018
Hi,
How hard is it to support the native backend? which operations are
missing or any other limitations?
Em qua, 17 de out de 2018 às 05:47, Guo, Yejun <yejun.guo at intel.com> escreveu:
>
> see the algorithm's paper and code below.
>
> the filter's parameter looks like:
> sdr2hdr=model_filename=/path_to_tensorflow_graph.pb:out_fmt=gbrp10le
>
> The input of the deep CNN model is RGB24 while the output is float
> for each color channel. This is the filter's default behavior to
> output format with gbrpf32le. And gbrp10le is also supported as the
> output, so we can see the rendering result in a player, as a reference.
>
> To generate the model file, we need modify the original script a little.
> - set name='y' for y_final within script at
> https://github.com/gabrieleilertsen/hdrcnn/blob/master/network.py
> - add the following code to the script at
> https://github.com/gabrieleilertsen/hdrcnn/blob/master/hdrcnn_predict.py
>
> graph = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ["y"])
> tf.train.write_graph(graph, '.', 'graph.pb', as_text=False)
>
> The filter only works when tensorflow C api is supported in the system,
> native backend is not supported since there are some different types of
> layers in the deep CNN model, besides CONV and DEPTH_TO_SPACE.
>
> https://arxiv.org/pdf/1710.07480.pdf:
> author = "Eilertsen, Gabriel and Kronander, Joel, and Denes, Gyorgy and Mantiuk, Rafał and Unger, Jonas",
> title = "HDR image reconstruction from a single exposure using deep CNNs",
> journal = "ACM Transactions on Graphics (TOG)",
> number = "6",
> volume = "36",
> articleno = "178",
> year = "2017"
>
> https://github.com/gabrieleilertsen/hdrcnn
>
> btw, as a whole solution, metadata should also be generated from
> the sdr video, so to be encoded as a HDR video. Not supported yet.
> This patch just focuses on this paper.
>
> v2: use AV_OPT_TYPE_PIXEL_FMT for filter option
> remove some unnecessary code
> Use in->linesize[0] and FFMAX/FFMIN
> remove flag AVFILTER_FLAG_SLICE_THREADS
> add av_log message when error
>
> Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> ---
> libavfilter/Makefile | 1 +
> libavfilter/allfilters.c | 1 +
> libavfilter/vf_sdr2hdr.c | 266 +++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 268 insertions(+)
> create mode 100644 libavfilter/vf_sdr2hdr.c
>
> diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> index 62cc2f5..88e7da6 100644
> --- a/libavfilter/Makefile
> +++ b/libavfilter/Makefile
> @@ -360,6 +360,7 @@ OBJS-$(CONFIG_SOBEL_OPENCL_FILTER) += vf_convolution_opencl.o opencl.o
> OBJS-$(CONFIG_SPLIT_FILTER) += split.o
> OBJS-$(CONFIG_SPP_FILTER) += vf_spp.o
> OBJS-$(CONFIG_SR_FILTER) += vf_sr.o
> +OBJS-$(CONFIG_SDR2HDR_FILTER) += vf_sdr2hdr.o
> OBJS-$(CONFIG_SSIM_FILTER) += vf_ssim.o framesync.o
> OBJS-$(CONFIG_STEREO3D_FILTER) += vf_stereo3d.o
> OBJS-$(CONFIG_STREAMSELECT_FILTER) += f_streamselect.o framesync.o
> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> index 5e72803..1645c0f 100644
> --- a/libavfilter/allfilters.c
> +++ b/libavfilter/allfilters.c
> @@ -319,6 +319,7 @@ extern AVFilter ff_vf_scale_npp;
> extern AVFilter ff_vf_scale_qsv;
> extern AVFilter ff_vf_scale_vaapi;
> extern AVFilter ff_vf_scale2ref;
> +extern AVFilter ff_vf_sdr2hdr;
> extern AVFilter ff_vf_select;
> extern AVFilter ff_vf_selectivecolor;
> extern AVFilter ff_vf_sendcmd;
> diff --git a/libavfilter/vf_sdr2hdr.c b/libavfilter/vf_sdr2hdr.c
> new file mode 100644
> index 0000000..fa61bfa
> --- /dev/null
> +++ b/libavfilter/vf_sdr2hdr.c
> @@ -0,0 +1,266 @@
> +/*
> + * Copyright (c) 2018 Guo Yejun
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +/**
> + * @file
> + * Filter implementing HDR image reconstruction from a single exposure using deep CNNs.
> + * https://arxiv.org/pdf/1710.07480.pdf
> + */
> +
> +#include "avfilter.h"
> +#include "formats.h"
> +#include "internal.h"
> +#include "libavutil/opt.h"
> +#include "libavutil/qsort.h"
> +#include "libavformat/avio.h"
> +#include "libswscale/swscale.h"
> +#include "dnn_interface.h"
> +#include <math.h>
> +
> +typedef struct SDR2HDRContext {
> + const AVClass *class;
> +
> + char* model_filename;
> + enum AVPixelFormat out_fmt;
> + DNNModule* dnn_module;
> + DNNModel* model;
> + DNNData input, output;
> +} SDR2HDRContext;
> +
> +#define OFFSET(x) offsetof(SDR2HDRContext, x)
> +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
> +static const AVOption sdr2hdr_options[] = {
> + { "model_filename", "path to model file specifying network architecture and its parameters", OFFSET(model_filename), AV_OPT_TYPE_STRING, {.str=NULL}, 0, 0, FLAGS },
I think you could use "model" instead of "model_filename", it is
shorter and more consistent with the vf_sr filter.
> + { "out_fmt", "the data format of the filter's output, it could be gbrpf32le [default] or gbrp10le", OFFSET(out_fmt), AV_OPT_TYPE_PIXEL_FMT, {.i64=AV_PIX_FMT_GBRPF32LE}, AV_PIX_FMT_NONE, AV_PIX_FMT_NB, FLAGS },
> + { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(sdr2hdr);
> +
> +static av_cold int init(AVFilterContext* context)
> +{
> + SDR2HDRContext* ctx = context->priv;
> +
> + if (ctx->out_fmt != AV_PIX_FMT_GBRPF32LE && ctx->out_fmt != AV_PIX_FMT_GBRP10LE) {
> + av_log(context, AV_LOG_ERROR, "could not support the output format\n");
> + return AVERROR(ENOSYS);
> + }
> +
> +#if (CONFIG_LIBTENSORFLOW == 1)
> + ctx->dnn_module = ff_get_dnn_module(DNN_TF);
> + if (!ctx->dnn_module){
> + av_log(context, AV_LOG_ERROR, "could not create DNN module for tensorflow backend\n");
> + return AVERROR(ENOMEM);
> + }
> + if (!ctx->model_filename){
> + av_log(context, AV_LOG_ERROR, "model file for network was not specified\n");
> + return AVERROR(EIO);
> + }
> + if (!ctx->dnn_module->load_model) {
> + av_log(context, AV_LOG_ERROR, "load_model for network was not specified\n");
> + return AVERROR(EIO);
> + }
> + ctx->model = (ctx->dnn_module->load_model)(ctx->model_filename);
> + if (!ctx->model){
> + av_log(context, AV_LOG_ERROR, "could not load DNN model\n");
> + return AVERROR(EIO);
> + }
> + return 0;
> +#else
> + return AVERROR(EIO);
> +#endif
> +}
> +
> +static int query_formats(AVFilterContext* context)
> +{
> + const enum AVPixelFormat in_formats[] = {AV_PIX_FMT_RGB24,
> + AV_PIX_FMT_NONE};
> + enum AVPixelFormat out_formats[2];
> + SDR2HDRContext* ctx = context->priv;
> + AVFilterFormats* formats_list;
> + int ret = 0;
> +
> + formats_list = ff_make_format_list(in_formats);
> + if ((ret = ff_formats_ref(formats_list, &context->inputs[0]->out_formats)) < 0)
> + return ret;
> +
> + out_formats[0] = ctx->out_fmt;
> + out_formats[1] = AV_PIX_FMT_NONE;
> + formats_list = ff_make_format_list(out_formats);
> + if ((ret = ff_formats_ref(formats_list, &context->outputs[0]->in_formats)) < 0)
> + return ret;
> +
> + return 0;
> +}
> +
> +static int config_props(AVFilterLink* inlink)
> +{
> + AVFilterContext* context = inlink->dst;
> + SDR2HDRContext* ctx = context->priv;
> + AVFilterLink* outlink = context->outputs[0];
> + DNNReturnType result;
> +
> + // the dnn model is tied with resolution due to deconv layer of tensorflow
> + // now just support 1920*1080 and so the magic numbers within this file
> + if (inlink->w != 1920 || inlink->h != 1080) {
> + av_log(context, AV_LOG_ERROR, "only support frame size with 1920*1080\n");
> + return AVERROR(ENOSYS);
> + }
> +
> + ctx->input.width = 1920;
> + ctx->input.height = 1088; //the model requires height is a multiple of 32,
> + ctx->input.channels = 3;
> +
> + result = (ctx->model->set_input_output)(ctx->model->model, &ctx->input, &ctx->output);
> + if (result != DNN_SUCCESS){
> + av_log(context, AV_LOG_ERROR, "could not set input and output for the model\n");
> + return AVERROR(EIO);
> + }
> +
> + memset(ctx->input.data, 0, ctx->input.channels * ctx->input.width * ctx->input.height * sizeof(float));
> + outlink->h = 1080;
> + outlink->w = 1920;
> + return 0;
> +}
> +
> +static float qsort_comparison_function_float(const void *a, const void *b)
> +{
> + return *(const float *)a - *(const float *)b;
> +}
> +
> +static int filter_frame(AVFilterLink* inlink, AVFrame* in)
> +{
> + DNNReturnType dnn_result = DNN_SUCCESS;
> + AVFilterContext* context = inlink->dst;
> + SDR2HDRContext* ctx = context->priv;
> + AVFilterLink* outlink = context->outputs[0];
> + AVFrame* out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> + int total_pixels = in->height * in->width;
> +
> + av_frame_copy_props(out, in);
> +
> + for (int i = 0; i < in->linesize[0] * in->height; ++i) {
> + ctx->input.data[i] = in->data[0][i] / 255.0f;
> + }
> +
> + dnn_result = (ctx->dnn_module->execute_model)(ctx->model);
> + if (dnn_result != DNN_SUCCESS){
> + av_log(context, AV_LOG_ERROR, "failed to execute loaded model\n");
> + return AVERROR(EIO);
> + }
> +
> + if (ctx->out_fmt == AV_PIX_FMT_GBRPF32LE) {
> + float* outg = (float*)out->data[0];
> + float* outb = (float*)out->data[1];
> + float* outr = (float*)out->data[2];
> + for (int i = 0; i < total_pixels; ++i) {
> + float r = ctx->output.data[i*3];
> + float g = ctx->output.data[i*3+1];
> + float b = ctx->output.data[i*3+2];
> + outr[i] = r;
> + outg[i] = g;
> + outb[i] = b;
> + }
> + } else {
> + // here, we just use a rough mapping to the 10bit contents
> + // meta data generation for HDR video encoding is not supported yet
> + float* converted_data = (float*)malloc(total_pixels * 3 * sizeof(float));
> + short* outg = (short*)out->data[0];
> + short* outb = (short*)out->data[1];
> + short* outr = (short*)out->data[2];
> +
> + float max = 1.0f;
> + for (int i = 0; i < total_pixels * 3; ++i) {
> + float d = ctx->output.data[i];
> + d = sqrt(d);
> + converted_data[i] = d;
> + max = FFMAX(d, max);
> + }
> +
> + if (max > 1.0f) {
> + AV_QSORT(converted_data, total_pixels * 3, float, qsort_comparison_function_float);
> + // 0.5% pixels are clipped
> + max = converted_data[(int)(total_pixels * 3 * 0.995)];
> + max = FFMAX(max, 1.0f);
> +
> + for (int i = 0; i < total_pixels * 3; ++i) {
> + float d = ctx->output.data[i];
> + d = sqrt(d);
> + d = FFMIN(d, max);
> + converted_data[i] = d;
> + }
> + }
> +
> + for (int i = 0; i < total_pixels; ++i) {
> + float r = converted_data[i*3];
> + float g = converted_data[i*3+1];
> + float b = converted_data[i*3+2];
> + outr[i] = r / max * 1023;
> + outg[i] = g / max * 1023;
> + outb[i] = b / max * 1023;
> + }
> +
> + free(converted_data);
> + }
> +
> + av_frame_free(&in);
> + return ff_filter_frame(outlink, out);
> +}
> +
> +static av_cold void uninit(AVFilterContext* context)
> +{
> + SDR2HDRContext* ctx = context->priv;
> +
> + if (ctx->dnn_module){
> + (ctx->dnn_module->free_model)(&ctx->model);
> + av_freep(&ctx->dnn_module);
> + }
> +}
> +
> +static const AVFilterPad sdr2hdr_inputs[] = {
> + {
> + .name = "default",
> + .type = AVMEDIA_TYPE_VIDEO,
> + .config_props = config_props,
> + .filter_frame = filter_frame,
> + },
> + { NULL }
> +};
> +
> +static const AVFilterPad sdr2hdr_outputs[] = {
> + {
> + .name = "default",
> + .type = AVMEDIA_TYPE_VIDEO,
> + },
> + { NULL }
> +};
> +
> +AVFilter ff_vf_sdr2hdr = {
> + .name = "sdr2hdr",
> + .description = NULL_IF_CONFIG_SMALL("HDR image reconstruction from a single exposure using deep CNNs."),
> + .priv_size = sizeof(SDR2HDRContext),
> + .init = init,
> + .uninit = uninit,
> + .query_formats = query_formats,
> + .inputs = sdr2hdr_inputs,
> + .outputs = sdr2hdr_outputs,
> + .priv_class = &sdr2hdr_class,
> + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC,
> +};
> --
> 2.7.4
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list