[FFmpeg-devel] Status and Plans for Subtitle Filters

Sun Feb 23 22:59:59 EET 2020

On Sat, Feb 22, 2020 at 09:47:20AM +0100, Clément Bœsch wrote:
> On Fri, Feb 14, 2020 at 03:26:30AM +0000, Soft Works wrote:
> > Hi,
> > 
> 
> Hi,
> 
> > I am looking for some guidance regarding future plans about processing subtitle streams in filter graphs.
> > 
> > Please correct me where I'm wrong - this is the situation as I've understood it so far:
> [...]
> 
> Your analysis was pretty much on point. I've been away from FFmpeg development
> from around the time of that patchset. While I can't recommend a course of
> action, I can elaborate on what was blocking and missing. Beware that this is
> reconstructed from my unreliable memory and I may forget important points.
> 
> Last state can be found at https://github.com/ubitux/FFmpeg/tree/subtitles-new-api
> 
> The last WIP commit includes a TODO.txt which I'm sharing here for the
> record:
> 
> > TODO:
> > - heartbeat mechanism
> > - drop sub2video (needs heartbeat)
> > - properly deal with -ss and -t (need strim filter?)
> > - sub_start_display/sub_end_display needs to be honored
> > - find a test case for dvbsub as it's likely broken (ffmpeg.c hack is
> >   removed and should be replaced by a EAGAIN logic in lavc/utils.c)
> > - make it pass FATE:
> >   * fix cc/subcc
> >   * broke various other stuff
> > - Changelog/APIchanges
> > - proper API doxy
> > - update lavfi/subtitles?
> > - merge [avs]null filters
> > - filters doc
> > - avcodec_default_get_buffer2?
> > - how to transfer subtitle header down to libavfilter?
> 
> The biggest TODO entry right now is the heartbeat mechanism which is required
> for being able to drop the sub2video hack. You've seen that discussed in the
> thread.
> 
> Thing is, that branch is already a relatively invasive and may include
> controversial API change. Typically, the way I decided to handle subtitle
> text/rectangle allocation within AVSubtitle is "different" but I couldn't come
> up with a better solution. Basically, we have to fit them in AVFrame for a
> clean integration within FFmpeg ecosystem, but subtitles are not simple buffers
> like audio and video can be: they have to be backed by more complex dynamic
> structures.
> 
> Also unfortunately, addressing the problem through an iterative process is
> extremely difficult in the current situation due to historical technical debt.
> You may have noticed that the decode and encode subtitles API are a few
> generations behind the audio and video ones. The reason it wasn't modernized
> earlier was because it was already a pita in the past.
> 

> The subtitles refactor requires to see the big picture and all the problems at
> once. 

really ?
just hypothetically, and playing the devils advocat here.
what would happen if one problem or set of problems is solved at a time ?

Maybe the thinking should not be "what are all the things that might need
to be considered"
but rather "what is the minimum set of things that need to be considered"
to make the first step towards a better API/first git push

> Since the core change (subtitles in AVFrame) requires the introduction of
> a new subtitles structure and API, it also involve addressing the shortcomings
> of the original API (or maybe we could tolerate a new API that actually looks
> like the old?). So even if we ignore the subtitle-in-avframe thing, we don't
> have a clear answer for a sane API that handles everything. Here is a
> non-exhaustive list of stuff that we have to take into account while thinking
> about that:
> 
> - text subtitles with and without markup

> - sparsity, overlapping

heartbeat frames would eliminate sparsity
what happens if you forbid overlapping ?
I mean if i just imagine for a moment that a video stream carries some data
256 color palette in 4 parts and these get updated in a way that overlapps in
time like you talk about in subtitles.
This isnt a problem for video, we just have the whole palette anywhere it is
needed
And similarly a B frame updates parts of the pixels of the previous and next
frame yet our AVFrame contains whole bitmaps.

at the stage of encoding such subtitle AVFrame back to "binary" data, the encoder
would have to merge identical subtitle parts if that is supported.

> - different semantics for duration (duration available, no known duration,
>   event-based clearing, ...)

This one is annoying (though similar to video where its just not so much an
issue as video is generally regularly spaced)
But does this actually impact the API in any way ?
decoder -> avframe -> encoder
(if some information is missing some look 
ahead/buffer/filter/converter/whatever may be needed but the API wouldnt 
change i think and that should work with any API)

> - closed captions / teletext

What happens if you ignore these at this stage?

> - bitmap subtitles and their potential colorspaces (each rectangle as an
>   AVFrame is way overkill but technically that's exactly what it is)

then a AVFrame needs to represent a collection of rectangles.
Its either 1 or N for the design i think.
Our current subtitle structures already have a similar design so this
wouldnt be really different.

Thanks

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200223/b3ee925f/attachment.sig>