[FFmpeg-devel] Status and Plans for Subtitle Filters

Sat Feb 22 15:01:11 EET 2020

On Sat, Feb 22, 2020 at 12:51:13PM +0000, Soft Works wrote:
[...]
> > > Would there be any filters at all that would operate on subtitles?
> > >  (other than rendering to a video surface)
> > 
> > Sure. A few ideas that come to my mind:
> > 
> > - rasterization (text subtitles to bitmap subtitles)
> > - ocr (bitmap subtitles to text)
> > - all kind of text processing (eventually piped to some external tools)
> > - censoring bad words
> > - inserting "watermark" text
> > - timing processing: trimming, shift, scaling of time
> > - lorem ipsum or similar "source" filter (equivalent to our video mires)
> >   for testing purposes
> > - audio to text for auto captioning
> > - text to audio for audio synthesis
> > - concat multiple subtitle files (think of a multiple episode merged into
> >   one, and you want to do the same for subtitles)
> > - merge/overlap multiple subtitle tracks (think of multi-language
> >   subtitles)
> 
> I knew there would be reasonable ones. Maybe except the text-to-speech
> Idea. I suppose you need to be a masochist to watch a full movie hearing
> synthesized speech ;-)

As a creator, you may not want to use your voice (because of your
pronunciation, because you're mute, because you care for you anonymity,
etc), and thus you would write subtitles (for accessibility) and use a
synth for the audio track. We already have something similar btw, see the
flite filter.

> 
> > [...]
> > > But when the primary purpose of having subtitles in filtergraphs would
> > > be to have them eventually converted to bitmaps, and given that it's
> > > really so extremely difficult and controversial to implement this,
> > > plus that there seems to be only moderate support for this from other
> > > developers- could it possibly be an easier and more pragmatic solution
> > > to convert the subtitles to images simply before they are entering the
> > filtergraph?
> > 
> > That means it's likely to be only available within the command line tool and
> > not the API. Unless you design a separated "libavsubtitle" (discussed in the
> > past several times), but you'll need at some point many interfaces with the
> > usual demuxing-decoding-encoding-muxing pipeline.
> 
> You're right, I was focused on the CLI, and first of all at the huge discrepancy 
> in the required amount of work. 
> 
> While the predominant model of ffmpeg development (patch-trial-and-error
> until it gets accepted) seems to have proven to be quite successful, I'm 
> wondering whether in this case it wouldn't be a better strategy to come to
> agree about a plan before anybody will spend more time on this..?

Yes, that was my point earlier.

-- 
Clément B.