[FFmpeg-devel] [PATCH 3/3] lavfi: add deshake_opencl filter

Thu Jul 25 02:16:33 EEST 2019

On 20/07/2019 00:19, Jarek Samic wrote:
> This filter is the subject of my GSoC project.
> 
> This is a video stabilization / deshake filter (name undetermined, feel free to discuss) that uses feature
> point matching and RANSAC to determine a camera path, smooths the camera path with a gaussian filter, and
> then applies the new path to the video.
> 
> There are a number of debug features that can be turned on (viewing point matches, viewing transform
> details, viewing average kernel execution times). See the bottom of the file.
> 
> The filter is finished feature-wise and therefore ready for some review. There are a few things left for
> me to do before this can be merged, though:
> 
> * Improve performance of the OpenCL kernels (in particular: harris_response and match_descriptors)
> * See what I can do to improve the RANSAC model generation to reduce the amount of jitter in the result
> * Clean up the few remaining TODOs
> 
> Just keep that in mind.
> 
> ---
>  libavfilter/Makefile            |    2 +
>  libavfilter/allfilters.c        |    1 +
>  libavfilter/opencl/deshake.cl   |  621 ++++++++++
>  libavfilter/opencl_source.h     |    1 +
>  libavfilter/vf_deshake_opencl.c | 1992 +++++++++++++++++++++++++++++++
>  5 files changed, 2617 insertions(+)
>  create mode 100644 libavfilter/opencl/deshake.cl
>  create mode 100644 libavfilter/vf_deshake_opencl.c

As already noted, it's missing the configure test.

> diff --git a/libavfilter/opencl/deshake.cl b/libavfilter/opencl/deshake.cl
> new file mode 100644
> index 0000000000..737239f634
> --- /dev/null
> +++ b/libavfilter/opencl/deshake.cl
> @@ -0,0 +1,621 @@
> +/*
> + * Copyright (C) 2000, Intel Corporation, all rights reserved.
> + * Copyright (C) 2013, OpenCV Foundation, all rights reserved.
> + * Third party copyrights are property of their respective owners.
> + *
> + * Redistribution and use in source and binary forms, with or without modification,
> + * are permitted provided that the following conditions are met:
> + *
> + *   * Redistribution's of source code must retain the above copyright notice,
> + *     this list of conditions and the following disclaimer.
> + *
> + *   * Redistribution's in binary form must reproduce the above copyright notice,
> + *     this list of conditions and the following disclaimer in the documentation
> + *     and/or other materials provided with the distribution.
> + *
> + *   * The name of the copyright holders may not be used to endorse or promote products
> + *     derived from this software without specific prior written permission.
> + *
> + * This software is provided by the copyright holders and contributors "as is" and
> + * any express or implied warranties, including, but not limited to, the implied
> + * warranties of merchantability and fitness for a particular purpose are disclaimed.
> + * In no event shall the Intel Corporation or contributors be liable for any direct,
> + * indirect, incidental, special, exemplary, or consequential damages
> + * (including, but not limited to, procurement of substitute goods or services;
> + * loss of use, data, or profits; or business interruption) however caused
> + * and on any theory of liability, whether in contract, strict liability,
> + * or tort (including negligence or otherwise) arising in any way out of
> + * the use of this software, even if advised of the possibility of such damage.
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */

The way this is written makes it look like the whole file is usable under the 3-clause BSD licence - I don't think that's what you intended?

I think I would either put the two copyright notices the other way around, or add copyright clause for yourself before the second licence.

Some random comments follow.  I've already been through this before, so most of this is odd minor stuff I'm happening to notice this time.

> +
> ...
> +
> +// Converts the src image to grayscale
> +__kernel void grayscale(
> +    __read_only image2d_t src,
> +    __write_only image2d_t grayscale
> +) {
> +    int2 loc = (int2)(get_global_id(0), get_global_id(1));
> +    write_imagef(grayscale, loc, (float4)(pixel_grayscale(src, loc), 0.0f, 0.0f, 1.0f));
> +}
> +
> +// This kernel computes the harris response for the given grayscale src image
> +// within the given radius and writes it to harris_buf
> +__kernel void harris_response(
> +    __read_only image2d_t grayscale,
> +    __global float *harris_buf,
> +    int radius
> +) {
> +    int2 loc = (int2)(get_global_id(0), get_global_id(1));
> +    float scale = 1.0f / ((1 << 2) * radius * 255.0f);
> +
> +    float sobel_mask_x[3][3] = {
> +        {-1, 0, 1},
> +        {-2, 0, 2},
> +        {-1, 0, 1}
> +    };
> +
> +    float sobel_mask_y[3][3] = {
> +        { 1,   2,  1},
> +        { 0,   0,  0},
> +        {-1,  -2, -1}
> +    };
> +
> +    float sumdxdy = sum_deriv_prod(grayscale, loc, sobel_mask_x, sobel_mask_y, radius);
> +    float sumdx2 = sum_deriv_pow(grayscale, loc, sobel_mask_x, radius);
> +    float sumdy2 = sum_deriv_pow(grayscale, loc, sobel_mask_y, radius);

One workitem if this samples the same points of the grayscale image twice four times in the convolutions, and then multiplies a third of them by zero.  Maybe there would be a bit of gain by inlining and unrolling the parts?

> +
> +    float trace = sumdx2 + sumdy2;
> +    // r = det(M) - k(trace(M))^2
> +    // k usually between 0.04 to 0.06
> +    float r = (sumdx2 * sumdy2 - sumdxdy * sumdxdy) - 0.04f * (trace * trace) * pow(scale, 4);

pow(, 4) -> pown(, 4)

> +
> +    // Threshold the r value
> +    write_to_1d_arrf(harris_buf, loc, r * step(HARRIS_THRESHOLD, r));
> +}
> +
> +// Gets a patch centered around a float coordinate from a grayscale image using
> +// bilinear interpolation
> +void get_rect_sub_pix(
> +    __read_only image2d_t grayscale,
> +    float *buffer,
> +    int size_x,
> +    int size_y,
> +    float2 center
> +) {
> +    for (int i = 0; i < size_y; i++) {
> +        for (int j = 0; j < size_x; j++) {
> +            buffer[i * size_x + j] = read_imagef(
> +                grayscale,
> +                sampler_linear,
> +                (float2)(j + center.x - (size_x - 1) * 0.5, i + center.y - (size_y - 1) * 0.5)

Maybe something like:

float offset = ((float2)(size_x, size_y) - 1.0f) * 0.5f;

then

(float2)(j, i) + center - offset

> +            ).x * 255.0;
> +        }
> +    }
> +}
> +
> +// Refines detected features at a sub-pixel level
> +//
> +// This function is ported from OpenCV
> +float2 corner_sub_pix(
> +    __read_only image2d_t grayscale,
> +    float *mask
> +) {
> +    // This is the location of the feature point we are refining
> +    float2 cI = (float2)(get_global_id(0), get_global_id(1));
> +    float2 cT = cI;
> +    int src_width = get_global_size(0);
> +    int src_height = get_global_size(1);
> +
> +    const int max_iters = 40;
> +    const float eps = 0.001 * 0.001;
> +    int i, j, k;
> +
> +    int iter = 0;
> +    float err = 0;
> +    float subpix[(REFINE_WIN_W + 2) * (REFINE_WIN_H + 2)];
> +    const float flt_epsilon = 0x1.0p-23f;
> +
> +    do {
> +        float2 cI2;
> +        float a = 0, b = 0, c = 0, bb1 = 0, bb2 = 0;
> +
> +        get_rect_sub_pix(grayscale, subpix, REFINE_WIN_W + 2, REFINE_WIN_H + 2, cI);
> +        const float *subpix_ptr = &subpix;
> +        subpix_ptr += REFINE_WIN_W + 2 + 1;
> +
> +        // process gradient
> +        for (i = 0, k = 0; i < REFINE_WIN_H; i++, subpix_ptr += REFINE_WIN_W + 2) {
> +            float py = i - REFINE_WIN_HALF_H;
> +
> +            for (j = 0; j < REFINE_WIN_W; j++, k++) {
> +                float m = mask[k];
> +                float tgx = subpix_ptr[j + 1] - subpix_ptr[j - 1];
> +                float tgy = subpix_ptr[j + REFINE_WIN_W + 2] - subpix_ptr[j - REFINE_WIN_W - 2];
> +                float gxx = tgx * tgx * m;
> +                float gxy = tgx * tgy * m;
> +                float gyy = tgy * tgy * m;
> +                float px = j - REFINE_WIN_HALF_W;
> +
> +                a += gxx;
> +                b += gxy;
> +                c += gyy;
> +
> +                bb1 += gxx * px + gxy * py;
> +                bb2 += gxy * px + gyy * py;
> +            }
> +        }
> +
> +        float det = a * c - b * b;
> +        if (fabs(det) <= flt_epsilon * flt_epsilon) {
> +            break;
> +        }
> +
> +        // 2x2 matrix inversion
> +        float scale = 1.0 / det;

Surprise double-precision division :)

> +        cI2.x = (float)(cI.x + (c * scale * bb1) - (b * scale * bb2));
> +        cI2.y = (float)(cI.y - (b * scale * bb1) + (a * scale * bb2));
> +        err = (cI2.x - cI.x) * (cI2.x - cI.x) + (cI2.y - cI.y) * (cI2.y - cI.y);

Some of the stuff here looks vectorisable.

err = dot(cI2 - cI, cI2 - cI);

> +
> +        cI = cI2;
> +        if (cI.x < 0 || cI.x >= src_width || cI.y < 0 || cI.y >= src_height) {
> +            break;
> +        }
> +    } while (++iter < max_iters && err > eps);
> +
> +    // Make sure new point isn't too far from the initial point (indicates poor convergence)
> +    if (fabs(cI.x - cT.x) > REFINE_WIN_HALF_W || fabs(cI.y - cT.y) > REFINE_WIN_HALF_H) {
> +        cI = cT;
> +    }
> +
> +    return cI;
> +}
> +
> ...> +
> +// Extracts BRIEF descriptors from the grayscale src image for the given features
> +// using the provided sampler.
> +__kernel void brief_descriptors(
> +    __read_only image2d_t grayscale,
> +    __global const float2 *refined_features,
> +    // for 512 bit descriptors
> +    __global ulong8 *desc_buf,
> +    __global const PointPair *brief_pattern
> +) {
> +    int2 loc = (int2)(get_global_id(0), get_global_id(1));
> +    float2 feature = read_from_1d_arrf2(refined_features, loc);
> +
> +    // TODO: restructure data so we don't have to do this, if possible
> +    if (feature.x == -1) {
> +        write_to_1d_arrul8(desc_buf, loc, (ulong8)(0));
> +        return;
> +    }
> +
> +    ulong8 desc = 0;
> +    ulong *p = &desc;
> +
> +    for (int i = 0; i < 8; ++i) {
> +        for (int j = 0; j < 64; ++j) {
> +            PointPair pair = brief_pattern[j * (i + 1)];
> +            float l1 = read_imagef(grayscale, sampler_linear, (float2)(feature.x + pair.p1.x, feature.y + pair.p1.y)).x;

(float2)(feature.x + pair.p1.x, feature.y + pair.p1.y) == feature + pair.p1

> +            float l2 = read_imagef(grayscale, sampler_linear, (float2)(feature.x + pair.p2.x, feature.y + pair.p2.y)).x;
> +
> +            if (l1 < l2) {
> +                p[i] |= 1UL << j;
> +            }
> +        }
> +    }
> +
> +    write_to_1d_arrul8(desc_buf, loc, desc);
> +}
> +
> +// Given buffers with descriptors for the current and previous frame, determines
> +// which ones match (looking in a box of search_radius size around each descriptor)
> +// and writes the resulting point correspondences to matches_buf.
> +__kernel void match_descriptors(
> +    __global const float2 *prev_refined_features,
> +    __global const float2 *refined_features,
> +    __global const ulong8 *desc_buf,
> +    __global const ulong8 *prev_desc_buf,
> +    __global MotionVector *matches_buf,
> +    int search_radius
> +) {
> +    int2 loc = (int2)(get_global_id(0), get_global_id(1));
> +    ulong8 desc = read_from_1d_arrul8(desc_buf, loc);
> +    MotionVector invalid_vector = (MotionVector) {
> +        (PointPair) {
> +            (float2)(-1, -1),
> +            (float2)(-1, -1)
> +        },
> +        0
> +    };
> +
> +    // TODO: restructure data so we don't have to do this, if possible
> +    // also this is an ugly hack
> +    if (desc.s0 == 0 && desc.s1 == 0) {
> +        write_to_1d_arrvec(
> +            matches_buf,
> +            loc,
> +            invalid_vector
> +        );
> +        return;
> +    }
> +
> +    int start_x = clamp(loc.x - search_radius, 0, (int)get_global_size(0) - 1);
> +    int end_x = clamp(loc.x + search_radius, 0, (int)get_global_size(0) - 1);
> +    int start_y = clamp(loc.y - search_radius, 0, (int)get_global_size(1) - 1);
> +    int end_y = clamp(loc.y + search_radius, 0, (int)get_global_size(1) - 1);
> +
> +    for (int i = start_x; i < end_x; ++i) {
> +        for (int j = start_y; j < end_y; ++j) {

Perhaps nicer as:

int2 start = max(loc - search_radius, 0);
int2 end   = min(loc + search_radius, (int2)(get_global_size(0) - 1, get_global_size(1) - 1));

for (int i = start.x; i < end.x; i++) {
    for (int j = start.y; j < end.y; j++) {

> +            int2 prev_point = (int2)(i, j);
> +            int total_dist = 0;
> +
> +            ulong8 prev_desc = read_from_1d_arrul8(prev_desc_buf, prev_point);
> +
> +            if (prev_desc.s0 == 0 && prev_desc.s1 == 0) {
> +                continue;
> +            }
> +
> +            ulong *prev_desc_p = &prev_desc;
> +            ulong *desc_p = &desc;
> +
> +            for (int i = 0; i < 8; i++) {
> +                total_dist += popcount(desc_p[i] ^ prev_desc_p[i]);
> +            }
> +
> +            if (total_dist < DISTANCE_THRESHOLD) {
> +                write_to_1d_arrvec(
> +                    matches_buf,
> +                    loc,
> +                    (MotionVector) {
> +                        (PointPair) {
> +                            read_from_1d_arrf2(prev_refined_features, prev_point),
> +                            read_from_1d_arrf2(refined_features, loc)
> +                        },
> +                        1
> +                    }
> +                );
> +
> +                return;
> +            }
> +        }
> +    }
> +
> +    // There is no found match for this point
> +    write_to_1d_arrvec(
> +        matches_buf,
> +        loc,
> +        invalid_vector
> +    );
> +}
> +
> ...> diff --git a/libavfilter/vf_deshake_opencl.c b/libavfilter/vf_deshake_opencl.c
> new file mode 100644
> index 0000000000..e342da9d90
> --- /dev/null
> +++ b/libavfilter/vf_deshake_opencl.c
> @@ -0,0 +1,1992 @@
> +/*
> + * Copyright (C) 2000-2008, Intel Corporation, all rights reserved.
> + * Copyright (C) 2009, Willow Garage Inc., all rights reserved.
> + * Copyright (C) 2013, OpenCV Foundation, all rights reserved.
> + * Third party copyrights are property of their respective owners.
> + *
> + * Redistribution and use in source and binary forms, with or without modification,
> + * are permitted provided that the following conditions are met:
> + *
> + *   * Redistribution's of source code must retain the above copyright notice,
> + *     this list of conditions and the following disclaimer.
> + *
> + *   * Redistribution's in binary form must reproduce the above copyright notice,
> + *     this list of conditions and the following disclaimer in the documentation
> + *     and/or other materials provided with the distribution.
> + *
> + *   * The name of the copyright holders may not be used to endorse or promote products
> + *     derived from this software without specific prior written permission.
> + *
> + * This software is provided by the copyright holders and contributors "as is" and
> + * any express or implied warranties, including, but not limited to, the implied
> + * warranties of merchantability and fitness for a particular purpose are disclaimed.
> + * In no event shall the Intel Corporation or contributors be liable for any direct,
> + * indirect, incidental, special, exemplary, or consequential damages
> + * (including, but not limited to, procurement of substitute goods or services;
> + * loss of use, data, or profits; or business interruption) however caused
> + * and on any theory of liability, whether in contract, strict liability,
> + * or tort (including negligence or otherwise) arising in any way out of
> + * the use of this software, even if advised of the possibility of such damage.
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */

Same comment on copyright statement as above.

> +
> +#include <stdbool.h>
> +#include <float.h>
> +#include <libavutil/lfg.h>
> +#include "libavutil/opt.h"
> +#include "libavutil/imgutils.h"
> +#include "libavutil/mem.h"
> +#include "libavutil/fifo.h"
> +#include "libavutil/common.h"
> +#include "avfilter.h"
> +#include "framequeue.h"
> +#include "filters.h"
> +#include "transform.h"
> +#include "formats.h"
> +#include "internal.h"
> +#include "opencl.h"
> +#include "opencl_source.h"
> +#include "video.h"
> +
> +// Block size over which to compute harris response
> +#define HARRIS_RADIUS 2
> +// Number of bits for BRIEF descriptors
> +#define BREIFN 512
> +// Size of the patch from which a BRIEF descriptor is extracted
> +// This is the size used in OpenCV
> +#define BRIEF_PATCH_SIZE 31
> +#define BRIEF_PATCH_SIZE_HALF BRIEF_PATCH_SIZE / 2

As a general rule, put calculated macro values in brackets to avoid possible operator precedence issues.

> +// The radius within which to search around descriptors for matches from the
> +// previous frame
> +#define MATCH_SEARCH_RADIUS 70
> +
> +#define MATCHES_CONTIG_SIZE 1000
> +
> +typedef struct PointPair {
> +    // Previous frame
> +    cl_float2 p1;
> +    // Current frame
> +    cl_float2 p2;
> +} PointPair;
> +
> +typedef struct SmoothedPointPair {
> +    // Previous frame
> +    cl_int2 p1;
> +    // Smoothed point in current frame
> +    cl_float2 p2;
> +} SmoothedPointPair;
> +
> +typedef struct MotionVector {
> +    PointPair p;
> +    // Used to mark vectors as potential outliers
> +    int should_consider;

cl_int.

> +} MotionVector;
> +
> ...> +
> +typedef struct DeshakeOpenCLContext {
> +    OpenCLFilterContext ocf;
> +    // Whether or not the above `OpenCLFilterContext` has been initialized
> +    int initialized;
> +
> +    // These variables are used in the activate callback
> +    int64_t duration;
> +    bool eof;
> +
> +    // State for random number generation
> +    AVLFG alfg;
> +
> +    // FIFO frame queue used to buffer future frames for processing
> +    FFFrameQueue fq;
> +    // Ringbuffers for frame positions
> +    AbsoluteFrameMotion abs_motion;
> +
> +    // The number of frames' motion to consider before and after the frame we are
> +    // smoothing
> +    int smooth_window;
> +    // The number of the frame we are currently processing
> +    int curr_frame;
> +
> +    // Stores a 1d array of normalised gaussian kernel values for convolution
> +    float *gauss_kernel;
> +
> +    // Buffer for error values used in RANSAC code
> +    float *ransac_err;
> +
> +    // Information regarding how to crop the smoothed luminance (or RGB) planes
> +    CropInfo crop_y;
> +    // Information regarding how to crop the smoothed chroma planes
> +    CropInfo crop_uv;
> +
> +    // Whether or not we are processing YUV input (as oppposed to RGB)
> +    bool is_yuv;
> +    // The underlying format of the hardware surfaces
> +    int sw_format;
> +
> +    // Buffer to copy `matches` into for the CPU to work with
> +    MotionVector *matches_host;
> +    MotionVector *matches_contig_host;
> +
> +    cl_command_queue command_queue;
> +    cl_kernel kernel_grayscale;
> +    cl_kernel kernel_harris_response;
> +    cl_kernel kernel_refine_features;
> +    cl_kernel kernel_brief_descriptors;
> +    cl_kernel kernel_match_descriptors;
> +    cl_kernel kernel_transform;
> +    cl_kernel kernel_crop_upscale;
> +
> +    // Stores a frame converted to grayscale
> +    cl_mem grayscale;
> +    // Stores the harris response for a frame (measure of "cornerness" for each pixel)
> +    cl_mem harris_buf;
> +
> +    // Detected features after non-maximum suppression and sub-pixel refinement
> +    cl_mem refined_features;
> +    // Saved from the previous frame
> +    cl_mem prev_refined_features;
> +
> +    // BRIEF sampling pattern that is randomly initialized
> +    cl_mem brief_pattern;
> +    // Feature point descriptors for the current frame
> +    cl_mem descriptors;
> +    // Feature point descriptors for the previous frame
> +    cl_mem prev_descriptors;
> +    // Vectors between points in current and previous frame
> +    cl_mem matches;
> +    cl_mem matches_contig;
> +    // Holds the matrix to transform luminance (or RGB) with
> +    cl_mem transform_y;
> +    // Holds the matrix to transform chroma with
> +    cl_mem transform_uv;
> +
> +    // Configurable options
> +
> +    bool tripod_mode;
> +    bool debug_on;
> +    bool should_crop;

Confusingly, AV_OPT_TYPE_BOOL actually writes to an int.  Sorry :/

<http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavutil/opt.c;h=93d6c26c11ad80e4a2e4a5d4cce86df5301ffab1;hb=HEAD#l123>

> +
> +    // Whether or not feature points should be refined at a sub-pixel level
> +    cl_int refine_features;
> +    // If the user sets a value other than the default, 0, this percentage is
> +    // translated into a sigma value ranging from 0.5 to 40.0
> +    float smooth_percent;
> +    // This number is multiplied by the video frame rate to determine the size
> +    // of the smooth window
> +    float smooth_window_multiplier;
> +
> +    // Debug stuff
> +
> +    cl_kernel kernel_draw_debug_info;
> +    cl_mem debug_matches;
> +    cl_mem debug_model_matches;
> +
> +    // These store the total time spent executing the different kernels in nanoseconds
> +    unsigned long long grayscale_time;
> +    unsigned long long harris_response_time;
> +    unsigned long long refine_features_time;
> +    unsigned long long brief_descriptors_time;
> +    unsigned long long match_descriptors_time;
> +    unsigned long long transform_time;
> +    unsigned long long crop_upscale_time;
> +
> +    // Time spent copying matched features from the device to the host
> +    unsigned long long read_buf_time;
> +} DeshakeOpenCLContext;
> +
> +// Returns a random uniformly-distributed number in [low, high]
> +static int rand_in(int low, int high, AVLFG *alfg) {
> +    double rand_val = av_lfg_get(alfg) / (1.0 + UINT_MAX);
> +    int range = high - low + 1;
> +    int rand_scaled = (rand_val * range) + low;
> +
> +    return rand_scaled;
> +}

My don't-use-floating-point-unless-you-really-have-to sense doesn't like this, though I can't point to exactly why.  Is there an easy way to make this integer instead?  (If high-low is small compared to UINT_MAX then maybe modding is not significantly biased.)

Also, { on new line please (and in other functions below).

> +
> ...
> +
> +// Checks that the 3 points in the given array are not collinear
> +static bool points_not_collinear(const cl_float2 **points) {
> +    int j, k, i = 2;
> +
> +    for (j = 0; j < i; j++) {
> +        double dx1 = points[j]->s[0] - points[i]->s[0];
> +        double dy1 = points[j]->s[1] - points[i]->s[1];
> +
> +        for (k = 0; k < j; k++) {
> +            double dx2 = points[k]->s[0] - points[i]->s[0];
> +            double dy2 = points[k]->s[1] - points[i]->s[1];
> +
> +            if (fabs(dx2*dy1 - dy2*dx1) <= 0.001) {

Maybe add a comment on how not-collinear this requires the points to be.  Say you have a 4K video and two of your points are in opposite corners, how wide is the area where the third point not allowed to be?

> +                return false;
> +            }
> +        }
> +    }
> +
> +    return true;
> +}
> +
> ...
> +
> +static int deshake_opencl_init(AVFilterContext *avctx)
> +{
> +    DeshakeOpenCLContext *ctx = avctx->priv;
> +    AVFilterLink *outlink = avctx->outputs[0];
> +    AVFilterLink *inlink = avctx->inputs[0];
> +    // Pointer to the host-side pattern buffer to be initialized and then copied
> +    // to the GPU
> +    PointPair *pattern_host;
> +    cl_int cle;
> +    int err;
> +    cl_ulong8 zeroed_ulong8;
> +    FFFrameQueueGlobal fqg;
> +    cl_image_format grayscale_format;
> +    cl_image_desc grayscale_desc;
> +    cl_command_queue_properties queue_props;
> +
> +    const int descriptor_buf_size = outlink->h * outlink->w * (BREIFN / 8);
> +    const int features_buf_size = outlink->h * outlink->w * sizeof(cl_float2);
> +
> +    const AVHWFramesContext *hw_frames_ctx = (AVHWFramesContext*)inlink->hw_frames_ctx->data;
> +    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(hw_frames_ctx->sw_format);

av_assert0() that the various pointers here are not null, to catch missing cases as quickly as possible.

> +
> +    ff_framequeue_global_init(&fqg);
> +    ff_framequeue_init(&ctx->fq, &fqg);
> +    ctx->eof = false;
> +    ctx->smooth_window = (int)(av_q2d(avctx->inputs[0]->frame_rate) * ctx->smooth_window_multiplier);
> +    ctx->curr_frame = 0;
> +
> +    memset(&zeroed_ulong8, 0, sizeof(cl_ulong8));
> +
> +    ctx->gauss_kernel = av_malloc_array(ctx->smooth_window, sizeof(float));
> +    if (!ctx->gauss_kernel) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    ctx->ransac_err = av_malloc_array(MATCHES_CONTIG_SIZE, sizeof(float));
> +    if (!ctx->ransac_err) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    for (int i = 0; i < RingbufCount; i++) {
> +        ctx->abs_motion.ringbuffers[i] = av_fifo_alloc_array(
> +            ctx->smooth_window,
> +            sizeof(float)
> +        );
> +
> +        if (!ctx->abs_motion.ringbuffers[i]) {
> +            err = AVERROR(ENOMEM);
> +            goto fail;
> +        }
> +    }
> +
> +    if (ctx->debug_on) {
> +        ctx->abs_motion.debug_matches = av_fifo_alloc_array(
> +            ctx->smooth_window / 2,
> +            sizeof(DebugMatches)
> +        );
> +
> +        if (!ctx->abs_motion.debug_matches) {
> +            err = AVERROR(ENOMEM);
> +            goto fail;
> +        }
> +    }
> +
> +    ctx->abs_motion.curr_frame_offset = 0;
> +    ctx->abs_motion.data_start_offset = -1;
> +    ctx->abs_motion.data_end_offset = -1;
> +
> +    pattern_host = av_malloc_array(BREIFN, sizeof(PointPair));
> +    if (!pattern_host) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    ctx->matches_host = av_malloc_array(outlink->h * outlink->w, sizeof(MotionVector));
> +    if (!ctx->matches_host) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    ctx->matches_contig_host = av_malloc_array(MATCHES_CONTIG_SIZE, sizeof(MotionVector));
> +    if (!ctx->matches_contig_host) {
> +        err = AVERROR(ENOMEM);
> +        goto fail;
> +    }
> +
> +    // Initializing the patch pattern for building BREIF descriptors with
> +    av_lfg_init(&ctx->alfg, 234342424);
> +    for (int i = 0; i < BREIFN; ++i) {
> +        PointPair pair;
> +
> +        for (int j = 0; j < 2; ++j) {
> +            pair.p1.s[j] = rand_in(-BRIEF_PATCH_SIZE_HALF, BRIEF_PATCH_SIZE_HALF + 1, &ctx->alfg);
> +            pair.p2.s[j] = rand_in(-BRIEF_PATCH_SIZE_HALF, BRIEF_PATCH_SIZE_HALF + 1, &ctx->alfg);
> +        }
> +
> +        pattern_host[i] = pair;
> +    }
> +
> +    if (desc->flags & AV_PIX_FMT_FLAG_RGB) {
> +        ctx->is_yuv = false;
> +    } else {
> +        ctx->is_yuv = true;
> +    }
> +    ctx->sw_format = hw_frames_ctx->sw_format;

I think you want to do a little more checking on the format here (or above).  E.g. it accepts planar GBR and makes a mess of it.

> +
> +    err = ff_opencl_filter_load_program(avctx, &ff_opencl_source_deshake, 1);
> +    if (err < 0)
> +        goto fail;
> +
> +    if (ctx->debug_on) {
> +        queue_props = CL_QUEUE_PROFILING_ENABLE;
> +    } else {
> +        queue_props = 0;
> +    }
> +    ctx->command_queue = clCreateCommandQueue(
> +        ctx->ocf.hwctx->context,
> +        ctx->ocf.hwctx->device_id,
> +        queue_props,
> +        &cle
> +    );
> +    CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to create OpenCL command queue %d.\n", cle);
> +
> +    CL_CREATE_KERNEL(ctx, grayscale);
> +    CL_CREATE_KERNEL(ctx, harris_response);
> +    CL_CREATE_KERNEL(ctx, refine_features);
> +    CL_CREATE_KERNEL(ctx, brief_descriptors);
> +    CL_CREATE_KERNEL(ctx, match_descriptors);
> +    CL_CREATE_KERNEL(ctx, transform);
> +    CL_CREATE_KERNEL(ctx, crop_upscale);
> +    if (ctx->debug_on)
> +        CL_CREATE_KERNEL(ctx, draw_debug_info);
> +
> +    if (!ctx->is_yuv) {
> +        grayscale_format.image_channel_order = CL_R;
> +        grayscale_format.image_channel_data_type = CL_FLOAT;
> +
> +        grayscale_desc.image_width = outlink->w;
> +        grayscale_desc.image_height = outlink->h;
> +        grayscale_desc.image_type = CL_MEM_OBJECT_IMAGE2D;
> +        grayscale_desc.image_row_pitch = 0;
> +        grayscale_desc.image_slice_pitch = 0;
> +        grayscale_desc.num_mip_levels = 0;
> +        grayscale_desc.num_samples = 0;
> +        grayscale_desc.buffer = NULL;

Compound literals are nicer for making this sort of structure.

> +
> +        ctx->grayscale = clCreateImage(
> +            ctx->ocf.hwctx->context,
> +            0,
> +            &grayscale_format,
> +            &grayscale_desc,
> +            NULL,
> +            &cle
> +        );
> +        CL_FAIL_ON_ERROR(AVERROR(EIO), "Failed to create grayscale image: %d.\n", cle);
> +    }
> +
> +    CL_CREATE_BUFFER(ctx, harris_buf, outlink->h * outlink->w * sizeof(float));
> +    CL_CREATE_BUFFER(ctx, refined_features, features_buf_size);
> +    CL_CREATE_BUFFER(ctx, prev_refined_features, features_buf_size);
> +    CL_CREATE_BUFFER_FLAGS(
> +        ctx,
> +        brief_pattern,
> +        CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
> +        BREIFN * sizeof(PointPair),
> +        pattern_host
> +    );
> +    CL_CREATE_BUFFER(ctx, descriptors, descriptor_buf_size);
> +    CL_CREATE_BUFFER(ctx, prev_descriptors, descriptor_buf_size);
> +    CL_CREATE_BUFFER(ctx, matches, outlink->h * outlink->w * sizeof(MotionVector));
> +    CL_CREATE_BUFFER(ctx, matches_contig, MATCHES_CONTIG_SIZE * sizeof(MotionVector));
> +    CL_CREATE_BUFFER(ctx, transform_y, 9 * sizeof(float));
> +    CL_CREATE_BUFFER(ctx, transform_uv, 9 * sizeof(float));
> +    if (ctx->debug_on) {
> +        CL_CREATE_BUFFER(ctx, debug_matches, MATCHES_CONTIG_SIZE * sizeof(MotionVector));
> +        CL_CREATE_BUFFER(ctx, debug_model_matches, 3 * sizeof(MotionVector));
> +    }
> +
> +    ctx->initialized = 1;
> +    av_freep(&pattern_host);
> +
> +    return 0;
> +
> +fail:
> +    if (!pattern_host)
> +        av_freep(&pattern_host);
> +    return err;
> +}
> +
> ...
> +
> +// Add the given frame to the frame queue to eventually be processed.
> +//
> +// Also determines the motion from the previous frame and updates the stored
> +// motion information accordingly.
> +static int queue_frame(AVFilterLink *link, AVFrame *input_frame)
> +{
> ...> +}

These kernel-invocation macros do actually come out pretty nicely even with the strange sizeof() in there :)

> +
> ...
> +
> +static const AVOption deshake_opencl_options[] = {
> +    {
> +        "tripod", "simulates a tripod by preventing any camera movement whatsoever "
> +        "from the original frame",
> +        OFFSET(tripod_mode), AV_OPT_TYPE_BOOL, {.i64 = 0}, 0, 1, FLAGS
> +    },
> +    {
> +        "debug", "whether or not additional debug info should be displayed, both "
> +        "in the processed output and in the console.\n\n"
> +        "Note that in order to see console debug output you will also need to pass "
> +        "`-v verbose` to ffmpeg\n\n"
> +        "Viewing point matches in the output video is only supported for RGB input.",
> +        OFFSET(debug_on), AV_OPT_TYPE_BOOL, {.i64 = 0}, 0, 1, FLAGS
> +    },
> +    {
> +        "adaptive_crop", "if enabled the filter attempts to do a tiny bit of "
> +        "cropping to reduce the amount of mirrored content at the borders.",
> +        OFFSET(should_crop), AV_OPT_TYPE_BOOL, {.i64 = 1}, 0, 1, FLAGS
> +    },
> +    {
> +        "refine_features", "whether or not feature points should be refined at a "
> +        "sub-pixel level.\n\n"
> +        "This defaults to on for increased precision but can be turned off for a "
> +        "small performance gain.",
> +        OFFSET(refine_features), AV_OPT_TYPE_BOOL, {.i64 = 1}, 0, 1, FLAGS
> +    },
> +    {
> +        "smooth_strength", "the strength of the smoothing applied to the camera path "
> +        "from 0 to 1.\n\n"
> +        "1.0 is the maximum smoothing strength while values less than that result in "
> +        "less smoothing.\n\n"
> +        "0.0 is the default value and causes the filter to adaptively choose smoothing "
> +        "strength based on video conditions.",
> +        OFFSET(smooth_percent), AV_OPT_TYPE_FLOAT, {.dbl = 0.0f}, 0.0f, 1.0f, FLAGS
> +    },
> +    {
> +        "smooth_window_multiplier", "controls the size of the smoothing window (number "
> +        "of frames buffered to determine motion information from).\n\n"
> +        "The size of the smoothing window is determined by multiplying the framerate of "
> +        "the video by this number, which defaults to 2 and ranges from 0.1 to 10.0\n\n"
> +        "Larger values increase the amount of motion data available for determining how "
> +        "to smooth the camera path, potentially improving smoothness, but also increase "
> +        "latency and memory usage.",
> +        OFFSET(smooth_window_multiplier), AV_OPT_TYPE_FLOAT, {.dbl = 2.0}, 0.1, 10.0, FLAGS
> +    },

The descriptions here should generally be a single-line summary - see how they come out of "ffmpeg -h filter=deshake_opencl".

The longer descriptions you have here would be better suited to the manual section you are going to write :P

> +    { NULL }
> +};
> +
> +AVFILTER_DEFINE_CLASS(deshake_opencl);
> +
> +AVFilter ff_vf_deshake_opencl = {
> +    .name           = "deshake_opencl",
> +    .description    = NULL_IF_CONFIG_SMALL("Feature-point based video stabilization filter"),
> +    .priv_size      = sizeof(DeshakeOpenCLContext),
> +    .priv_class     = &deshake_opencl_class,
> +    .init           = &ff_opencl_filter_init,
> +    .uninit         = &deshake_opencl_uninit,
> +    .query_formats  = &ff_opencl_filter_query_formats,
> +    .activate       = activate,
> +    .inputs         = deshake_opencl_inputs,
> +    .outputs        = deshake_opencl_outputs,
> +    .flags_internal = FF_FILTER_FLAG_HWFRAME_AWARE
> +};

Thanks,

- Mark