[FFmpeg-devel] GSOC 2018 qualification task.
ANURAG SINGH IIT BHU
anurag.singh.phy15 at iitbhu.ac.in
Fri Apr 13 07:09:35 EEST 2018
Thank you sir, I'll implement the suggested reviews as soon as possible.
On Fri, Apr 13, 2018 at 4:04 AM, Michael Niedermayer <michael at niedermayer.cc
> wrote:
> On Fri, Apr 13, 2018 at 02:13:53AM +0530, ANURAG SINGH IIT BHU wrote:
> > Hello,
> > I have implemented the reviews mentioned on previous patch, now there is
> no
> > need to provide any subtitle file to the filter, I am attaching the
> > complete patch of the hellosubs filter.
> >
> > Command to run the filter
> > ffmpeg -i <videoname> -vf hellosubs=<videoname> helloout.mp4
> >
> >
> > Thanks and regards,
> > Anurag Singh.
> >
> >
> >
> >
> > On Tue, Apr 10, 2018 at 4:55 AM, Rostislav Pehlivanov <
> atomnuker at gmail.com>
> > wrote:
> >
> > > On 9 April 2018 at 19:10, Paul B Mahol <onemda at gmail.com> wrote:
> > >
> > > > On 4/9/18, Rostislav Pehlivanov <atomnuker at gmail.com> wrote:
> > > > > On 9 April 2018 at 03:59, ANURAG SINGH IIT BHU <
> > > > > anurag.singh.phy15 at iitbhu.ac.in> wrote:
> > > > >
> > > > >> This mail is regarding the qualification task assigned to me for
> the
> > > > >> GSOC project
> > > > >> in FFmpeg for automatic real-time subtitle generation using
> speech to
> > > > text
> > > > >> translation ML model.
> > > > >>
> > > > >
> > > > > i really don't think lavfi is the correct place for such code, nor
> that
> > > > the
> > > > > project's repo should contain such code at all.
> > > > > This would need to be in another repo and a separate library.
> > > >
> > > > Why? Are you against ocr filter too?
> > > >
> > >
> > > The OCR filter uses libtessract so I'm fine with it. Like I said, as
> long
> > > as the actual code to do it is in an external library I don't mind.
> > > Mozilla recently released Deep Speech (https://github.com/mozilla/
> > > DeepSpeech)
> > > which does pretty much exactly speech to text and is considered to
> have the
> > > most accurate one out there. Someone just needs to convert the
> tensorflow
> > > code to something more usable.
> > > _______________________________________________
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel at ffmpeg.org
> > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
>
> > Makefile | 1
> > allfilters.c | 1
> > vf_hellosubs.c | 513 ++++++++++++++++++++++++++++++
> +++++++++++++++++++++++++++
> > 3 files changed, 515 insertions(+)
> > 2432f100fddb7ec84e771be8282d4b66e3d1f50a 0001-avfilter-add-hellosubs-
> filter.patch
> > From ac0e09d431ea68aebfaef6e2ed0b450e76d473d9 Mon Sep 17 00:00:00 2001
> > From: ddosvulnerability <anurag.singh.phy15 at iitbhu.ac.in>
> > Date: Thu, 12 Apr 2018 22:06:43 +0530
> > Subject: [PATCH] avfilter: add hellosubs filter.
> >
> > ---
> > libavfilter/Makefile | 1 +
> > libavfilter/allfilters.c | 1 +
> > libavfilter/vf_hellosubs.c | 513 ++++++++++++++++++++++++++++++
> +++++++++++++++
> > 3 files changed, 515 insertions(+)
> > create mode 100644 libavfilter/vf_hellosubs.c
> >
> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> > index a90ca30..770b1b5 100644
> > --- a/libavfilter/Makefile
> > +++ b/libavfilter/Makefile
> > @@ -331,6 +331,7 @@ OBJS-$(CONFIG_SSIM_FILTER) +=
> vf_ssim.o framesync.o
> > OBJS-$(CONFIG_STEREO3D_FILTER) += vf_stereo3d.o
> > OBJS-$(CONFIG_STREAMSELECT_FILTER) += f_streamselect.o
> framesync.o
> > OBJS-$(CONFIG_SUBTITLES_FILTER) += vf_subtitles.o
> > +OBJS-$(CONFIG_HELLOSUBS_FILTER) += vf_hellosubs.o
> > OBJS-$(CONFIG_SUPER2XSAI_FILTER) += vf_super2xsai.o
> > OBJS-$(CONFIG_SWAPRECT_FILTER) += vf_swaprect.o
> > OBJS-$(CONFIG_SWAPUV_FILTER) += vf_swapuv.o
> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> > index 6eac828..a008908 100644
> > --- a/libavfilter/allfilters.c
> > +++ b/libavfilter/allfilters.c
> > @@ -322,6 +322,7 @@ extern AVFilter ff_vf_ssim;
> > extern AVFilter ff_vf_stereo3d;
> > extern AVFilter ff_vf_streamselect;
> > extern AVFilter ff_vf_subtitles;
> > +extern AVFilter ff_vf_hellosubs;
> > extern AVFilter ff_vf_super2xsai;
> > extern AVFilter ff_vf_swaprect;
> > extern AVFilter ff_vf_swapuv;
> > diff --git a/libavfilter/vf_hellosubs.c b/libavfilter/vf_hellosubs.c
> > new file mode 100644
> > index 0000000..b994050
> > --- /dev/null
> > +++ b/libavfilter/vf_hellosubs.c
> > @@ -0,0 +1,513 @@
> > +/*
> > + * Copyright (c) 2011 Baptiste Coudurier
> > + * Copyright (c) 2011 Stefano Sabatini
> > + * Copyright (c) 2012 Clément Bœsch
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301 USA
> > + */
> > +
> > +/**
> > + * @file
> > + * Libass hellosubs burning filter.
> > + *
> > +
> > + */
> > +
> > +#include <ass/ass.h>
> > +
> > +#include "config.h"
> > +#if CONFIG_SUBTITLES_FILTER
> > +# include "libavcodec/avcodec.h"
> > +# include "libavformat/avformat.h"
> > +#endif
> > +#include "libavutil/avstring.h"
> > +#include "libavutil/imgutils.h"
> > +#include "libavutil/opt.h"
> > +#include "libavutil/parseutils.h"
> > +#include "drawutils.h"
> > +#include "avfilter.h"
> > +#include "internal.h"
> > +#include "formats.h"
> > +#include "video.h"
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +
> > +typedef struct AssContext {
> > + const AVClass *class;
> > + ASS_Library *library;
> > + ASS_Renderer *renderer;
> > + ASS_Track *track;
> > + char *filename;
> > + char *fontsdir;
> > + char *charenc;
> > + char *force_style;
> > + int stream_index;
> > + int alpha;
> > + uint8_t rgba_map[4];
> > + int pix_step[4]; ///< steps per pixel for each plane of
> the main output
> > + int original_w, original_h;
> > + int shaping;
> > + FFDrawContext draw;
> > +} AssContext;
> > +
> > +#define OFFSET(x) offsetof(AssContext, x)
> > +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM
> > +
> > +#define COMMON_OPTIONS \
> > + {"filename", "set the filename of file to read",
> OFFSET(filename), AV_OPT_TYPE_STRING, {.str = NULL},
> CHAR_MIN, CHAR_MAX, FLAGS }, \
> > + {"f", "set the filename of file to read",
> OFFSET(filename), AV_OPT_TYPE_STRING, {.str = NULL},
> CHAR_MIN, CHAR_MAX, FLAGS }, \
> > + {"original_size", "set the size of the original video (used to
> scale fonts)", OFFSET(original_w), AV_OPT_TYPE_IMAGE_SIZE, {.str = NULL},
> CHAR_MIN, CHAR_MAX, FLAGS }, \
> > + {"fontsdir", "set the directory containing the fonts to
> read", OFFSET(fontsdir), AV_OPT_TYPE_STRING, {.str =
> NULL}, CHAR_MIN, CHAR_MAX, FLAGS }, \
> > + {"alpha", "enable processing of alpha channel",
> OFFSET(alpha), AV_OPT_TYPE_BOOL, {.i64 = 0 },
> 0, 1, FLAGS }, \
> > +
> > +/* libass supports a log level ranging from 0 to 7 */
> > +static const int ass_libavfilter_log_level_map[] = {
> > + [0] = AV_LOG_FATAL, /* MSGL_FATAL */
> > + [1] = AV_LOG_ERROR, /* MSGL_ERR */
> > + [2] = AV_LOG_WARNING, /* MSGL_WARN */
> > + [3] = AV_LOG_WARNING, /* <undefined> */
> > + [4] = AV_LOG_INFO, /* MSGL_INFO */
> > + [5] = AV_LOG_INFO, /* <undefined> */
> > + [6] = AV_LOG_VERBOSE, /* MSGL_V */
> > + [7] = AV_LOG_DEBUG, /* MSGL_DBG2 */
> > +};
> > +
> > +static void ass_log(int ass_level, const char *fmt, va_list args, void
> *ctx)
> > +{
> > + const int ass_level_clip = av_clip(ass_level, 0,
> > + FF_ARRAY_ELEMS(ass_libavfilter_log_level_map) - 1);
> > + const int level = ass_libavfilter_log_level_map[ass_level_clip];
> > +
> > + av_vlog(ctx, level, fmt, args);
> > + av_log(ctx, level, "\n");
> > +}
> > +
> > +static av_cold int init(AVFilterContext *ctx)
> > +{
> > + AssContext *ass = ctx->priv;
> > +
> > + if (!ass->filename) {
> > + av_log(ctx, AV_LOG_ERROR, "No filename provided!\n");
> > + return AVERROR(EINVAL);
> > + }
> > +
> > + ass->library = ass_library_init();
> > + if (!ass->library) {
> > + av_log(ctx, AV_LOG_ERROR, "Could not initialize libass.\n");
> > + return AVERROR(EINVAL);
> > + }
> > + ass_set_message_cb(ass->library, ass_log, ctx);
> > +
> > + ass_set_fonts_dir(ass->library, ass->fontsdir);
> > +
> > + ass->renderer = ass_renderer_init(ass->library);
> > + if (!ass->renderer) {
> > + av_log(ctx, AV_LOG_ERROR, "Could not initialize libass
> renderer.\n");
> > + return AVERROR(EINVAL);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static av_cold void uninit(AVFilterContext *ctx)
> > +{
> > + AssContext *ass = ctx->priv;
> > +
> > + if (ass->track)
> > + ass_free_track(ass->track);
> > + if (ass->renderer)
> > + ass_renderer_done(ass->renderer);
> > + if (ass->library)
> > + ass_library_done(ass->library);
> > +}
> > +
> > +static int query_formats(AVFilterContext *ctx)
> > +{
> > + return ff_set_common_formats(ctx, ff_draw_supported_pixel_
> formats(0));
> > +}
> > +
> > +static int config_input(AVFilterLink *inlink)
> > +{
> > + AssContext *ass = inlink->dst->priv;
> > +
> > + ff_draw_init(&ass->draw, inlink->format, ass->alpha ?
> FF_DRAW_PROCESS_ALPHA : 0);
> > +
> > + ass_set_frame_size (ass->renderer, inlink->w, inlink->h);
> > + if (ass->original_w && ass->original_h)
> > + ass_set_aspect_ratio(ass->renderer, (double)inlink->w /
> inlink->h,
> > + (double)ass->original_w / ass->original_h);
> > + if (ass->shaping != -1)
> > + ass_set_shaper(ass->renderer, ass->shaping);
> > +
> > + return 0;
> > +}
> > +
> > +/* libass stores an RGBA color in the format RRGGBBTT, where TT is the
> transparency level */
> > +#define AR(c) ( (c)>>24)
> > +#define AG(c) (((c)>>16)&0xFF)
> > +#define AB(c) (((c)>>8) &0xFF)
> > +#define AA(c) ((0xFF-(c)) &0xFF)
> > +
> > +static void overlay_ass_image(AssContext *ass, AVFrame *picref,
> > + const ASS_Image *image)
> > +{
> > + for (; image; image = image->next) {
> > + uint8_t rgba_color[] = {AR(image->color), AG(image->color),
> AB(image->color), AA(image->color)};
> > + FFDrawColor color;
> > + ff_draw_color(&ass->draw, &color, rgba_color);
> > + ff_blend_mask(&ass->draw, &color,
> > + picref->data, picref->linesize,
> > + picref->width, picref->height,
> > + image->bitmap, image->stride, image->w, image->h,
> > + 3, 0, image->dst_x, image->dst_y);
> > + }
> > +}
> > +
> > +static int filter_frame(AVFilterLink *inlink, AVFrame *picref)
> > +{
> > + AVFilterContext *ctx = inlink->dst;
> > + AVFilterLink *outlink = ctx->outputs[0];
> > + AssContext *ass = ctx->priv;
> > + int detect_change = 0;
> > + double time_ms = picref->pts * av_q2d(inlink->time_base) * 1000;
> > + ASS_Image *image = ass_render_frame(ass->renderer, ass->track,
> > + time_ms, &detect_change);
> > +
> > + if (detect_change)
> > + av_log(ctx, AV_LOG_DEBUG, "Change happened at time ms:%f\n",
> time_ms);
> > +
> > + overlay_ass_image(ass, picref, image);
> > +
> > + return ff_filter_frame(outlink, picref);
> > +}
> > +
> > +static const AVFilterPad ass_inputs[] = {
> > + {
> > + .name = "default",
> > + .type = AVMEDIA_TYPE_VIDEO,
> > + .filter_frame = filter_frame,
> > + .config_props = config_input,
> > + .needs_writable = 1,
> > + },
> > + { NULL }
> > +};
> > +
> > +static const AVFilterPad ass_outputs[] = {
> > + {
> > + .name = "default",
> > + .type = AVMEDIA_TYPE_VIDEO,
> > + },
> > + { NULL }
> > +};
> > +
> > +
> > +
> > +
> > +
> > +static const AVOption hellosubs_options[] = {
> > + COMMON_OPTIONS
> > + {"charenc", "set input character encoding", OFFSET(charenc),
> AV_OPT_TYPE_STRING, {.str = NULL}, CHAR_MIN, CHAR_MAX, FLAGS},
> > + {"stream_index", "set stream index",
> OFFSET(stream_index), AV_OPT_TYPE_INT, { .i64 = -1 }, -1,
> INT_MAX, FLAGS},
> > + {"si", "set stream index",
> OFFSET(stream_index), AV_OPT_TYPE_INT, { .i64 = -1 }, -1,
> INT_MAX, FLAGS},
> > + {"force_style", "force subtitle style",
> OFFSET(force_style), AV_OPT_TYPE_STRING, {.str = NULL}, CHAR_MIN,
> CHAR_MAX, FLAGS},
> > + {NULL},
> > +};
> > +
> > +static const char * const font_mimetypes[] = {
> > + "application/x-truetype-font",
> > + "application/vnd.ms-opentype",
> > + "application/x-font-ttf",
> > + NULL
> > +};
> > +
> > +static int attachment_is_font(AVStream * st)
> > +{
> > + const AVDictionaryEntry *tag = NULL;
> > + int n;
> > +
> > + tag = av_dict_get(st->metadata, "mimetype", NULL,
> AV_DICT_MATCH_CASE);
> > +
> > + if (tag) {
> > + for (n = 0; font_mimetypes[n]; n++) {
> > + if (av_strcasecmp(font_mimetypes[n], tag->value) == 0)
> > + return 1;
> > + }
> > + }
> > + return 0;
> > +}
> > +
> > +AVFILTER_DEFINE_CLASS(hellosubs);
> > +
> > +static av_cold int init_hellosubs(AVFilterContext *ctx)
> > +{
> > + int j, ret, sid;long int z=0;int t1=0;
> > + int k = 0;
> > + AVDictionary *codec_opts = NULL;
> > + AVFormatContext *fmt = NULL;
> > + AVCodecContext *dec_ctx = NULL;
> > + AVCodec *dec = NULL;
> > + const AVCodecDescriptor *dec_desc;
> > + AVStream *st;
> > + AVPacket pkt;
> > + AssContext *ass = ctx->priv;
>
> > + FILE *file;
> > + if ((file = fopen("hello.srt", "r")))
>
> there is no need for accessing an external file for the task of
> drawing a line of text.
>
>
> > + {
> > + fclose(file);
> > +
> > + }
> > + else
> > + {
> > + FILE * fp;
> > + fp = fopen ("hello.srt","w");
>
> thats even more true for writing such file.
> It also would not work predictable with multiple filters
>
>
> > + fprintf (fp, "1\n");
> > + fprintf (fp, "00:00:05,615 --> 00:00:08,083\n");
> > + fprintf (fp, "%s",ass->filename);
> > + fclose (fp);
> > +
> > + char cmd[300];
> > + strcpy(cmd,"ffmpeg -i ");
> > + strcat(cmd,ass->filename);
> > + char fn[200];
> > + strcpy(fn,ass->filename);
> > + strcat(cmd," -vf hellosubs=hello.srt helloout");
> > + int m=0;
> > + for(int w=(strlen(fn)-1);w>=0;w--)
> > + {if (fn[w]=='.')
> > + {m=w;
> > + break;}}
> > + char join[5];
> > + for(int loc=m;loc<strlen(fn);loc++)
> > + join[loc-m]=fn[loc];
> > + char rem[100];
> > + char join1[100];
> > + strcpy(join1,join);
> > + strcpy(rem,"helloout");
> > + strcat(rem,join1);
> > + remove(rem);
> > +
> > + strcat(cmd,join);
> > + system(cmd);
> > + remove("hello.srt");
> > +
> > +exit(0);
>
> also a filter cannot call exit(), in fact a library like libavfilter must
> not
> call exit()
>
>
> > +}
> > +
> > + /* Init libass */
> > + ret = init(ctx);
> > + if (ret < 0)
> > + return ret;
> > + ass->track = ass_new_track(ass->library);
> > + if (!ass->track) {
> > + av_log(ctx, AV_LOG_ERROR, "Could not create a libass track\n");
> > + return AVERROR(EINVAL);
> > + }
> > +
> > +
>
> > + ret = avformat_open_input(&fmt, ass->filename, NULL, NULL);
> > + if (ret < 0) {
> > + av_log(ctx, AV_LOG_ERROR, "Unable to open %s\n", ass->filename);
> > +
> > + }
>
> also no function from libavformat is needed, this filter should draw a
> line of
> text, not demux a file.
> You maybe misinterpredted my previous review. All unneeded code like every
> bit of
> libavformat use must be removed.
>
> You seem to be trying to workaround what i suggest not actually solve the
> issues
> raised.
> Like writing a file to replace the impossibility of accessing some input
> file
> directly. There really is no file and none can be written.
>
> The goal of this filter was to create subtitle packets/frames and pass
> them on.
> As this turned out too hard in the time available. The simpler goal now is
> to
> draw that text on a video frame.
>
> The filter gets video frames on its input and it passes them on to the
> output.
> In there it should write that Hello world text with the advancing number
> onto
> it
> For this there is no need to access any files, or use any demuxers.
> you can use the libass code from the subtitle filter as you do but that
> code
> uses a external subtitle file. You have to change this so it no longer
> uses a
> external file or demuxes this with libavformat. These steps are not needed
> and are incorrect for this task
>
> i suggest you remove "include "libavformat *" that way you will see
> exactly what must be removed
> and this should make the code simpler, it just isnt needed to have this
> baggage between the avcodec/libass and what you want to draw
>
> the libavformat code is there to read a subtitle file.
> There is no subtitle file. The filter should just draw a line saying
> hello world with a number.
>
>
> [...]
>
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Dictatorship: All citizens are under surveillance, all their steps and
> actions recorded, for the politicians to enforce control.
> Democracy: All politicians are under surveillance, all their steps and
> actions recorded, for the citizens to enforce control.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
More information about the ffmpeg-devel
mailing list