[FFmpeg-devel] [PATCH v5] libavfi/dnn: add LibTorch as one of DNN backend

Thu Mar 14 13:38:25 EET 2024

> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> wenbin.chen-at-intel.com at ffmpeg.org
> Sent: Monday, March 11, 2024 1:02 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH v5] libavfi/dnn: add LibTorch as one of DNN
> backend
> 
> From: Wenbin Chen <wenbin.chen at intel.com>
> 
> PyTorch is an open source machine learning framework that accelerates
> the path from research prototyping to production deployment. Official
> website: https://pytorch.org/. We call the C++ library of PyTorch as
> LibTorch, the same below.
> 
> To build FFmpeg with LibTorch, please take following steps as reference:
> 1. download LibTorch C++ library in https://pytorch.org/get-started/locally/,
> please select C++/Java for language, and other options as your need.
> Please download cxx11 ABI version (libtorch-cxx11-abi-shared-with-deps-
> *.zip).
> 2. unzip the file to your own dir, with command
> unzip libtorch-shared-with-deps-latest.zip -d your_dir
> 3. export libtorch_root/libtorch/include and
> libtorch_root/libtorch/include/torch/csrc/api/include to $PATH
> export libtorch_root/libtorch/lib/ to $LD_LIBRARY_PATH
> 4. config FFmpeg with ../configure --enable-libtorch --extra-cflag=-
> I/libtorch_root/libtorch/include --extra-cflag=-
> I/libtorch_root/libtorch/include/torch/csrc/api/include --extra-ldflags=-
> L/libtorch_root/libtorch/lib/
> 5. make
> 
> To run FFmpeg DNN inference with LibTorch backend:
> ./ffmpeg -i input.jpg -vf
> dnn_processing=dnn_backend=torch:model=LibTorch_model.pt -y output.jpg
> The LibTorch_model.pt can be generated by Python with torch.jit.script() api.
> Please note, torch.jit.trace() is not recommanded, since it does not support
> ambiguous input size.

Can you provide more detail (maybe a link from pytorch) about the 
libtorch_model.py generation and so we can have a try.

> 
> Signed-off-by: Ting Fu <ting.fu at intel.com>
> Signed-off-by: Wenbin Chen <wenbin.chen at intel.com>
> ---
>  configure                             |   5 +-
>  libavfilter/dnn/Makefile              |   1 +
>  libavfilter/dnn/dnn_backend_torch.cpp | 597
> ++++++++++++++++++++++++++
>  libavfilter/dnn/dnn_interface.c       |   5 +
>  libavfilter/dnn_filter_common.c       |  15 +-
>  libavfilter/dnn_interface.h           |   2 +-
>  libavfilter/vf_dnn_processing.c       |   3 +
>  7 files changed, 624 insertions(+), 4 deletions(-)
>  create mode 100644 libavfilter/dnn/dnn_backend_torch.cpp
> 
> +static int fill_model_input_th(THModel *th_model, THRequestItem *request)
> +{
> +    LastLevelTaskItem *lltask = NULL;
> +    TaskItem *task = NULL;
> +    THInferRequest *infer_request = NULL;
> +    DNNData input = { 0 };
> +    THContext *ctx = &th_model->ctx;
> +    int ret, width_idx, height_idx, channel_idx;
> +
> +    lltask = (LastLevelTaskItem *)ff_queue_pop_front(th_model-
> >lltask_queue);
> +    if (!lltask) {
> +        ret = AVERROR(EINVAL);
> +        goto err;
> +    }
> +    request->lltask = lltask;
> +    task = lltask->task;
> +    infer_request = request->infer_request;
> +
> +    ret = get_input_th(th_model, &input, NULL);
> +    if ( ret != 0) {
> +        goto err;
> +    }
> +    width_idx = dnn_get_width_idx_by_layout(input.layout);
> +    height_idx = dnn_get_height_idx_by_layout(input.layout);
> +    channel_idx = dnn_get_channel_idx_by_layout(input.layout);
> +    input.dims[height_idx] = task->in_frame->height;
> +    input.dims[width_idx] = task->in_frame->width;
> +    input.data = av_malloc(input.dims[height_idx] * input.dims[width_idx] *
> +                           input.dims[channel_idx] * sizeof(float));
> +    if (!input.data)
> +        return AVERROR(ENOMEM);
> +    infer_request->input_tensor = new torch::Tensor();
> +    infer_request->output = new torch::Tensor();
> +
> +    switch (th_model->model->func_type) {
> +    case DFT_PROCESS_FRAME:
> +        input.scale = 255;
> +        if (task->do_ioproc) {
> +            if (th_model->model->frame_pre_proc != NULL) {
> +                th_model->model->frame_pre_proc(task->in_frame, &input,
> th_model->model->filter_ctx);
> +            } else {
> +                ff_proc_from_frame_to_dnn(task->in_frame, &input, ctx);
> +            }
> +        }
> +        break;
> +    default:
> +        avpriv_report_missing_feature(NULL, "model function type %d",
> th_model->model->func_type);
> +        break;
> +    }
> +    *infer_request->input_tensor = torch::from_blob(input.data,
> +        {1, 1, input.dims[channel_idx], input.dims[height_idx],
> input.dims[width_idx]},

An extra dimension is added to support multiple frames for algorithms 
such as VideoSuperResolution, besides batch size, channel, height and width.

Let's first support the regular dimension for NCHW/NHWC,  and then
add support for multiple frames.