[FFmpeg-devel] [PATCH v2] fbdetile cpu based framebuffer layout detiling v02

C Hanish Menon hanishkvc at gmail.com
Sat Jun 27 23:12:07 EEST 2020


Hi,

It is a new video filter which I created to do detailing of the Intel
Tile-X and Tile-Y framebuffer layouts into linear layout using a logic
which runs on the cpu. It can be used if one uses kmsgrab and hwdownload to
capture screen on a Intel GPU based system, so that one can get proper
screen capture.

Without this kmsgrab will generate a unusable/scrambled capture, because
the contents will be tiled. I had this issue few days back when trying to
capture screen with wayland, so created this.

In the patch submitted, I have added the doc/filters.texi, which mentions
the same.



On Sun, Jun 28, 2020 at 1:30 AM Paul B Mahol <onemda at gmail.com> wrote:

> What is this?
>
> Missing documentation.
> NAK
>
> On 6/27/20, hanishkvc <hanishkvc at gmail.com> wrote:
> > v02-20200627IST2331
> >
> > Unrolled Intel Legacy Tile-Y detiling logic.
> >
> > Also a consolidated patch file, instead of the previous development
> > flow based multiple patch files.
> >
> > v01-20200627IST1308
> >
> > Implemented Intel Legacy Tile-X and Tile-Y detiling logic
> >
> > NOTES:
> >
> > This video filter allows framebuffers which are tiled to be detiled
> > using logic running on the cpu, into a linear layout.
> >
> > Currently it supports Intel Legacy Tile-X and Tile-Y layout detiling.
> > THis should help one to work with frames captured (say using kmsgrab)
> > on laptops having Intel GPU.
> >
> > Tile-X conversion logic has been explicitly cross checked, with Tile-X
> > based frames. However Tile-Y conv logic hasnt been tested with Tile-Y
> > based frames, but it should potentially do the job, based on my current
> > understanding of the Tile-Y layout format.
> >
> > TODO1: At a later time have to generate Tile-Y based frames, and then
> > cross check the corresponding logic explicitly.
> >
> > TODO2: May be use OpenGL or Vulcan buffer helper routines to do the
> > layout conversion. But some online discussions from sometime back seem
> > to indicate that this path is not fully bug free currently.
> > ---
> >  Changelog                 |   1 +
> >  doc/filters.texi          |  62 ++++++++
> >  libavfilter/Makefile      |   1 +
> >  libavfilter/allfilters.c  |   1 +
> >  libavfilter/vf_fbdetile.c | 309 ++++++++++++++++++++++++++++++++++++++
> >  5 files changed, 374 insertions(+)
> >  create mode 100644 libavfilter/vf_fbdetile.c
> >
> > diff --git a/Changelog b/Changelog
> > index a60e7d2eb8..0e03491f6a 100644
> > --- a/Changelog
> > +++ b/Changelog
> > @@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to
> youngest
> > within each release,
> >  releases are sorted from youngest to oldest.
> >
> >  version <next>:
> > +- fbdetile cpu based framebuffer layout detiling video filter
> >  - AudioToolbox output device
> >  - MacCaption demuxer
> >
> > diff --git a/doc/filters.texi b/doc/filters.texi
> > index 3c2dd2eb90..73ba21af89 100644
> > --- a/doc/filters.texi
> > +++ b/doc/filters.texi
> > @@ -12210,6 +12210,68 @@ It accepts the following optional parameters:
> >  The number of the CUDA device to use
> >  @end table
> >
> > + at anchor{fbdetile}
> > + at section fbdetile
> > +
> > +Detiles the Framebuffer tile layout into a linear layout using CPU.
> > +
> > +It currently supports conversion from Intel legacy tile-x and tile-y
> > layouts
> > +into a linear layout. This is useful if one is using kmsgrab and
> hwdownload
> > +to capture a screen which is using one of these non-linear layouts.
> > +
> > +Currently it expects the data to be a 32bit RGB based pixel format.
> However
> > +the logic doesnt do any pixel format conversion or so. Later will be
> > enabling
> > +16bit RGB data also, as the logic is transparent to it at one level.
> > +
> > +One could either insert this into the filter chain while capturing
> itself,
> > +or else, if it is slowing things down or so, then one could instead
> insert
> > +it into the filter chain during playback or transcoding or so.
> > +
> > +It supports the following optional parameters
> > +
> > + at table @option
> > + at item type
> > +Specify which detiling conversion to apply. The supported values are
> > + at table @var
> > + at item 0
> > +intel tile-x to linear conversion (the default)
> > + at item 1
> > +intel tile-y to linear conversion.
> > + at end table
> > + at end table
> > +
> > +If one wants to convert during capture itself, one could do
> > + at example
> > +ffmpeg -f kmsgrab -i - -vf "hwdownload, fbdetile" OUTPUT
> > + at end example
> > +
> > +However if one wants to convert after the tiled data has been already
> > captured
> > + at example
> > +ffmpeg -i INPUT -vf "fbdetile" OUTPUT
> > + at end example
> > + at example
> > +ffplay -i INPUT -vf "fbdetile"
> > + at end example
> > +
> > +NOTE: While transcoding a test 1080p h264 stream, with 276 frames, with
> two
> > +runs of each situation, the performance was has given below. However
> this
> > +was for the older | initial version of the logic, as well as it was run
> on
> > +the default linux chromebook->vm->container, so the perf values need
> not be
> > +proper. But in a relative sense the overhead would be similar.
> > + at example
> > +rm out.mp4; time ./ffmpeg -i input.mp4 out.mp4
> > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=0 out.mp4
> > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=1 out.mp4
> > + at end example
> > + at table @option
> > + at item with no fbdetile filter
> > +it took ~7.28 secs,
> > + at item with fbdetile=0 filter
> > +it took ~8.69 secs,
> > + at item with fbdetile=1 filter
> > +it took ~9.20 secs.
> > + at end table
> > +
> >  @section hqx
> >
> >  Apply a high-quality magnification filter designed for pixel art. This
> > filter
> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> > index 5123540653..bdb0c379ae 100644
> > --- a/libavfilter/Makefile
> > +++ b/libavfilter/Makefile
> > @@ -280,6 +280,7 @@ OBJS-$(CONFIG_HWDOWNLOAD_FILTER)             +=
> > vf_hwdownload.o
> >  OBJS-$(CONFIG_HWMAP_FILTER)                  += vf_hwmap.o
> >  OBJS-$(CONFIG_HWUPLOAD_CUDA_FILTER)          += vf_hwupload_cuda.o
> >  OBJS-$(CONFIG_HWUPLOAD_FILTER)               += vf_hwupload.o
> > +OBJS-$(CONFIG_FBDETILE_FILTER)               += vf_fbdetile.o
> >  OBJS-$(CONFIG_HYSTERESIS_FILTER)             += vf_hysteresis.o
> framesync.o
> >  OBJS-$(CONFIG_IDET_FILTER)                   += vf_idet.o
> >  OBJS-$(CONFIG_IL_FILTER)                     += vf_il.o
> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> > index 1183e40267..f8dceb2a88 100644
> > --- a/libavfilter/allfilters.c
> > +++ b/libavfilter/allfilters.c
> > @@ -265,6 +265,7 @@ extern AVFilter ff_vf_hwdownload;
> >  extern AVFilter ff_vf_hwmap;
> >  extern AVFilter ff_vf_hwupload;
> >  extern AVFilter ff_vf_hwupload_cuda;
> > +extern AVFilter ff_vf_fbdetile;
> >  extern AVFilter ff_vf_hysteresis;
> >  extern AVFilter ff_vf_idet;
> >  extern AVFilter ff_vf_il;
> > diff --git a/libavfilter/vf_fbdetile.c b/libavfilter/vf_fbdetile.c
> > new file mode 100644
> > index 0000000000..8b20c96d2c
> > --- /dev/null
> > +++ b/libavfilter/vf_fbdetile.c
> > @@ -0,0 +1,309 @@
> > +/*
> > + * Copyright (c) 2020 HanishKVC
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301
> > USA
> > + */
> > +
> > +/**
> > + * @file
> > + * Detile the Frame buffer's tile layout using the cpu
> > + * Currently it supports the legacy Intel Tile X layout detiling.
> > + *
> > + */
> > +
> > +/*
> > + * ToThink|Check: Optimisations
> > + *
> > + * Does gcc setting used by ffmpeg allows memcpy | stringops inlining,
> > + * loop unrolling, better native matching instructions, additional
> > + * optimisations, ...
> > + *
> > + * Does gcc map to optimal memcpy logic, based on the situation it is
> > + * used in.
> > + *
> > + * If not, may be look at vector_size or intrinsics or appropriate arch
> > + * and cpu specific inline asm or ...
> > + *
> > + */
> > +
> > +#include "libavutil/avassert.h"
> > +#include "libavutil/imgutils.h"
> > +#include "libavutil/opt.h"
> > +#include "avfilter.h"
> > +#include "formats.h"
> > +#include "internal.h"
> > +#include "video.h"
> > +
> > +enum FilterMode {
> > +    TYPE_INTELX,
> > +    TYPE_INTELY,
> > +    NB_TYPE
> > +};
> > +
> > +typedef struct FBDetileContext {
> > +    const AVClass *class;
> > +    int width, height;
> > +    int type;
> > +} FBDetileContext;
> > +
> > +#define OFFSET(x) offsetof(FBDetileContext, x)
> > +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
> > +static const AVOption fbdetile_options[] = {
> > +    { "type", "set framebuffer format_modifier type", OFFSET(type),
> > AV_OPT_TYPE_INT, {.i64=TYPE_INTELX}, 0, NB_TYPE-1, FLAGS, "type" },
> > +        { "intelx", "Intel Tile-X layout", 0, AV_OPT_TYPE_CONST,
> > {.i64=TYPE_INTELX}, INT_MIN, INT_MAX, FLAGS, "type" },
> > +        { "intely", "Intel Tile-Y layout", 0, AV_OPT_TYPE_CONST,
> > {.i64=TYPE_INTELY}, INT_MIN, INT_MAX, FLAGS, "type" },
> > +    { NULL }
> > +};
> > +
> > +AVFILTER_DEFINE_CLASS(fbdetile);
> > +
> > +static av_cold int init(AVFilterContext *ctx)
> > +{
> > +    FBDetileContext *fbdetile = ctx->priv;
> > +
> > +    if (fbdetile->type == TYPE_INTELX) {
> > +        fprintf(stderr,"INFO:fbdetile:init: Intel tile-x to linear\n");
> > +    } else if (fbdetile->type == TYPE_INTELY) {
> > +        fprintf(stderr,"INFO:fbdetile:init: Intel tile-y to linear\n");
> > +    } else {
> > +        fprintf(stderr,"DBUG:fbdetile:init: Unknown Tile format
> specified,
> > shouldnt reach here\n");
> > +    }
> > +    fbdetile->width = 1920;
> > +    fbdetile->height = 1080;
> > +    return 0;
> > +}
> > +
> > +static int query_formats(AVFilterContext *ctx)
> > +{
> > +    // Currently only RGB based 32bit formats are specified
> > +    // TODO: Technically the logic is transparent to 16bit RGB formats
> also
> > +    static const enum AVPixelFormat pix_fmts[] = {AV_PIX_FMT_RGB0,
> > AV_PIX_FMT_0RGB, AV_PIX_FMT_BGR0, AV_PIX_FMT_0BGR,
> > +                                                  AV_PIX_FMT_RGBA,
> > AV_PIX_FMT_ARGB, AV_PIX_FMT_BGRA, AV_PIX_FMT_ABGR,
> > +                                                  AV_PIX_FMT_NONE};
> > +    AVFilterFormats *fmts_list;
> > +
> > +    fmts_list = ff_make_format_list(pix_fmts);
> > +    if (!fmts_list)
> > +        return AVERROR(ENOMEM);
> > +    return ff_set_common_formats(ctx, fmts_list);
> > +}
> > +
> > +static int config_props(AVFilterLink *inlink)
> > +{
> > +    AVFilterContext *ctx = inlink->dst;
> > +    FBDetileContext *fbdetile = ctx->priv;
> > +
> > +    fbdetile->width = inlink->w;
> > +    fbdetile->height = inlink->h;
> > +    fprintf(stderr,"DBUG:fbdetile:config_props: %d x %d\n",
> > fbdetile->width, fbdetile->height);
> > +
> > +    return 0;
> > +}
> > +
> > +static void detile_intelx(AVFilterContext *ctx, int w, int h,
> > +                                uint8_t *dst, int dstLineSize,
> > +                          const uint8_t *src, int srcLineSize)
> > +{
> > +    // Offsets and LineSize are in bytes
> > +    int tileW = 128; // For a 32Bit / Pixel framebuffer, 512/4
> > +    int tileH = 8;
> > +
> > +    if (w*4 != srcLineSize) {
> > +        fprintf(stderr,"DBUG:fbdetile:intelx: w%dxh%d, dL%d, sL%d\n",
> w, h,
> > dstLineSize, srcLineSize);
> > +        fprintf(stderr,"ERRR:fbdetile:intelx: dont support LineSize |
> Pitch
> > going beyond width\n");
> > +    }
> > +    int sO = 0;
> > +    int dX = 0;
> > +    int dY = 0;
> > +    int nTRows = (w*h)/tileW;
> > +    int cTR = 0;
> > +    while (cTR < nTRows) {
> > +        int dO = dY*dstLineSize + dX*4;
> > +#ifdef DEBUG_FBTILE
> > +        fprintf(stderr,"DBUG:fbdetile:intelx: dX%d dY%d, sO%d, dO%d\n",
> dX,
> > dY, sO, dO);
> > +#endif
> > +        memcpy(dst+dO+0*dstLineSize, src+sO+0*512, 512);
> > +        memcpy(dst+dO+1*dstLineSize, src+sO+1*512, 512);
> > +        memcpy(dst+dO+2*dstLineSize, src+sO+2*512, 512);
> > +        memcpy(dst+dO+3*dstLineSize, src+sO+3*512, 512);
> > +        memcpy(dst+dO+4*dstLineSize, src+sO+4*512, 512);
> > +        memcpy(dst+dO+5*dstLineSize, src+sO+5*512, 512);
> > +        memcpy(dst+dO+6*dstLineSize, src+sO+6*512, 512);
> > +        memcpy(dst+dO+7*dstLineSize, src+sO+7*512, 512);
> > +        dX += tileW;
> > +        if (dX >= w) {
> > +            dX = 0;
> > +            dY += 8;
> > +        }
> > +        sO = sO + 8*512;
> > +        cTR += 8;
> > +    }
> > +}
> > +
> > +/*
> > + * Intel Legacy Tile-Y layout conversion support
> > + *
> > + * currently done in a simple dumb way. Two low hanging optimisations
> > + * that could be readily applied are
> > + *
> > + * a) unrolling the inner for loop
> > + *    --- Given small size memcpy, should help, DONE
> > + *
> > + * b) using simd based 128bit loading and storing along with prefetch
> > + *    hinting.
> > + *
> > + *    TOTHINK|CHECK: Does memcpy already does this and more if situation
> > + *    is right?!
> > + *
> > + *    As code (or even intrinsics) would be specific to each
> architecture,
> > + *    avoiding for now. Later have to check if vector_size attribute and
> > + *    corresponding implementation by gcc can handle different
> > architectures
> > + *    properly, such that it wont become worse than memcpy provided for
> > that
> > + *    architecture.
> > + *
> > + * Or maybe I could even merge the two intel detiling logics into one,
> as
> > + * the semantic and flow is almost same for both logics.
> > + *
> > + */
> > +static void detile_intely(AVFilterContext *ctx, int w, int h,
> > +                                uint8_t *dst, int dstLineSize,
> > +                          const uint8_t *src, int srcLineSize)
> > +{
> > +    // Offsets and LineSize are in bytes
> > +    int tileW = 4; // For a 32Bit / Pixel framebuffer, 16/4
> > +    int tileH = 32;
> > +
> > +    if (w*4 != srcLineSize) {
> > +        fprintf(stderr,"DBUG:fbdetile:intely: w%dxh%d, dL%d, sL%d\n",
> w, h,
> > dstLineSize, srcLineSize);
> > +        fprintf(stderr,"ERRR:fbdetile:intely: dont support LineSize |
> Pitch
> > going beyond width\n");
> > +    }
> > +    int sO = 0;
> > +    int dX = 0;
> > +    int dY = 0;
> > +    int nTRows = (w*h)/tileW;
> > +    int cTR = 0;
> > +    while (cTR < nTRows) {
> > +        int dO = dY*dstLineSize + dX*4;
> > +#ifdef DEBUG_FBTILE
> > +        fprintf(stderr,"DBUG:fbdetile:intely: dX%d dY%d, sO%d, dO%d\n",
> dX,
> > dY, sO, dO);
> > +#endif
> > +
> > +        memcpy(dst+dO+0*dstLineSize, src+sO+0*16, 16);
> > +        memcpy(dst+dO+1*dstLineSize, src+sO+1*16, 16);
> > +        memcpy(dst+dO+2*dstLineSize, src+sO+2*16, 16);
> > +        memcpy(dst+dO+3*dstLineSize, src+sO+3*16, 16);
> > +        memcpy(dst+dO+4*dstLineSize, src+sO+4*16, 16);
> > +        memcpy(dst+dO+5*dstLineSize, src+sO+5*16, 16);
> > +        memcpy(dst+dO+6*dstLineSize, src+sO+6*16, 16);
> > +        memcpy(dst+dO+7*dstLineSize, src+sO+7*16, 16);
> > +        memcpy(dst+dO+8*dstLineSize, src+sO+8*16, 16);
> > +        memcpy(dst+dO+9*dstLineSize, src+sO+9*16, 16);
> > +        memcpy(dst+dO+10*dstLineSize, src+sO+10*16, 16);
> > +        memcpy(dst+dO+11*dstLineSize, src+sO+11*16, 16);
> > +        memcpy(dst+dO+12*dstLineSize, src+sO+12*16, 16);
> > +        memcpy(dst+dO+13*dstLineSize, src+sO+13*16, 16);
> > +        memcpy(dst+dO+14*dstLineSize, src+sO+14*16, 16);
> > +        memcpy(dst+dO+15*dstLineSize, src+sO+15*16, 16);
> > +        memcpy(dst+dO+16*dstLineSize, src+sO+16*16, 16);
> > +        memcpy(dst+dO+17*dstLineSize, src+sO+17*16, 16);
> > +        memcpy(dst+dO+18*dstLineSize, src+sO+18*16, 16);
> > +        memcpy(dst+dO+19*dstLineSize, src+sO+19*16, 16);
> > +        memcpy(dst+dO+20*dstLineSize, src+sO+20*16, 16);
> > +        memcpy(dst+dO+21*dstLineSize, src+sO+21*16, 16);
> > +        memcpy(dst+dO+22*dstLineSize, src+sO+22*16, 16);
> > +        memcpy(dst+dO+23*dstLineSize, src+sO+23*16, 16);
> > +        memcpy(dst+dO+24*dstLineSize, src+sO+24*16, 16);
> > +        memcpy(dst+dO+25*dstLineSize, src+sO+25*16, 16);
> > +        memcpy(dst+dO+26*dstLineSize, src+sO+26*16, 16);
> > +        memcpy(dst+dO+27*dstLineSize, src+sO+27*16, 16);
> > +        memcpy(dst+dO+28*dstLineSize, src+sO+28*16, 16);
> > +        memcpy(dst+dO+29*dstLineSize, src+sO+29*16, 16);
> > +        memcpy(dst+dO+30*dstLineSize, src+sO+30*16, 16);
> > +        memcpy(dst+dO+31*dstLineSize, src+sO+31*16, 16);
> > +
> > +        dX += tileW;
> > +        if (dX >= w) {
> > +            dX = 0;
> > +            dY += 32;
> > +        }
> > +        sO = sO + 32*16;
> > +        cTR += 32;
> > +    }
> > +}
> > +
> > +static int filter_frame(AVFilterLink *inlink, AVFrame *in)
> > +{
> > +    AVFilterContext *ctx = inlink->dst;
> > +    FBDetileContext *fbdetile = ctx->priv;
> > +    AVFilterLink *outlink = ctx->outputs[0];
> > +    AVFrame *out;
> > +
> > +    out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
> > +    if (!out) {
> > +        av_frame_free(&in);
> > +        return AVERROR(ENOMEM);
> > +    }
> > +    av_frame_copy_props(out, in);
> > +
> > +    if (fbdetile->type == TYPE_INTELX) {
> > +        detile_intelx(ctx, fbdetile->width, fbdetile->height,
> > +                      out->data[0], out->linesize[0],
> > +                      in->data[0], in->linesize[0]);
> > +    } else if (fbdetile->type == TYPE_INTELY) {
> > +        detile_intely(ctx, fbdetile->width, fbdetile->height,
> > +                      out->data[0], out->linesize[0],
> > +                      in->data[0], in->linesize[0]);
> > +    }
> > +
> > +    av_frame_free(&in);
> > +    return ff_filter_frame(outlink, out);
> > +}
> > +
> > +static av_cold void uninit(AVFilterContext *ctx)
> > +{
> > +
> > +}
> > +
> > +static const AVFilterPad fbdetile_inputs[] = {
> > +    {
> > +        .name         = "default",
> > +        .type         = AVMEDIA_TYPE_VIDEO,
> > +        .config_props = config_props,
> > +        .filter_frame = filter_frame,
> > +    },
> > +    { NULL }
> > +};
> > +
> > +static const AVFilterPad fbdetile_outputs[] = {
> > +    {
> > +        .name = "default",
> > +        .type = AVMEDIA_TYPE_VIDEO,
> > +    },
> > +    { NULL }
> > +};
> > +
> > +AVFilter ff_vf_fbdetile = {
> > +    .name          = "fbdetile",
> > +    .description   = NULL_IF_CONFIG_SMALL("Detile Framebuffer using
> CPU"),
> > +    .priv_size     = sizeof(FBDetileContext),
> > +    .init          = init,
> > +    .uninit        = uninit,
> > +    .query_formats = query_formats,
> > +    .inputs        = fbdetile_inputs,
> > +    .outputs       = fbdetile_outputs,
> > +    .priv_class    = &fbdetile_class,
> > +};
> > --
> > 2.20.1
> >
> > _______________________________________________
> > ffmpeg-devel mailing list
> > ffmpeg-devel at ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>


-- 
Keep ;-)
HanishKVC


More information about the ffmpeg-devel mailing list