[FFmpeg-devel] Status and Plans for Subtitle Filters

Tue Feb 25 19:40:13 EET 2020

On Sun, Feb 23, 2020 at 09:59:59PM +0100, Michael Niedermayer wrote:
[...]
> > The subtitles refactor requires to see the big picture and all the problems at
> > once. 
> 
> really ?
> just hypothetically, and playing the devils advocat here.
> what would happen if one problem or set of problems is solved at a time ?

The first requirement of everything following is to define a new
structure/API for holding the subtitles within the AVFrame (which has to
live in lavu and not lavc like current API). So you have to address all
the current limitations in that new API first, unless you're ready to
change that new API 10x in the near future. And even if you keep most of
the current design, you still have to at least come up with ways to remove
all the current hacks that would go away while moving to the new design.

> 
> Maybe the thinking should not be "what are all the things that might need
> to be considered"
> but rather "what is the minimum set of things that need to be considered"
> to make the first step towards a better API/first git push
> 
> 
> 
> > Since the core change (subtitles in AVFrame) requires the introduction of
> > a new subtitles structure and API, it also involve addressing the shortcomings
> > of the original API (or maybe we could tolerate a new API that actually looks
> > like the old?). So even if we ignore the subtitle-in-avframe thing, we don't
> > have a clear answer for a sane API that handles everything. Here is a
> > non-exhaustive list of stuff that we have to take into account while thinking
> > about that:
> > 
> > - text subtitles with and without markup
> 
> > - sparsity, overlapping
> 
> heartbeat frames would eliminate sparsity

Yes, and like many aspect of this refactor: we need to come up and
formalize a convention. Of course I can make a suggestion, but there are
many other cases and exceptions.

> what happens if you forbid overlapping ?

You can't, it's too common. The classic "Hello, hello" was already
mentioned, but I could also mention subtitles used to "legend" the
environment (you know, like, signposts and stuff) in addition to
dialogues.

> > - different semantics for duration (duration available, no known duration,
> >   event-based clearing, ...)
> 
> This one is annoying (though similar to video where its just not so much an
> issue as video is generally regularly spaced)
> But does this actually impact the API in any way ?
> decoder -> avframe -> encoder

AVFrame always go through lavfi. I don't remember the details (it's been
about 2 years now), but the lack of semantic for duration was causing some
issues within lavfi.

> (if some information is missing some look 
> ahead/buffer/filter/converter/whatever may be needed but the API wouldnt 
> change i think and that should work with any API)
> 
> 
> > - closed captions / teletext
> 
> What happens if you ignore these at this stage?

I can't ignore them, the way we change the subtitle interface must address
their special behaviours. But I'd say my main issue with closed captions /
teletext was the same as DVB subtitles: we don't have much tests.

Typically, the DVB subtitles hack we have in ffmpeg.c like, forever, I'm
dropping it, but I can't test it properly: DVBsub coverage is almost
non-existent: http://coverage.ffmpeg.org/ (look for dvbsub and dvbsuddec)

Actually, if someone does improve subtitle coverage for formats I'm not
comfortable with (specifically cc and dvb), that would actually help A
LOT. At least I wouldn't have to speculate on how it should/could/would
behave.

BTW, if there is someone available to explain to me DVB subtitles, I'm all
ear. I understand that they have no duration, but random (partial?)
subtitle resets?

> > - bitmap subtitles and their potential colorspaces (each rectangle as an
> >   AVFrame is way overkill but technically that's exactly what it is)
> 
> then a AVFrame needs to represent a collection of rectangles.
> Its either 1 or N for the design i think.
> Our current subtitle structures already have a similar design so this
> wouldnt be really different.

Yeah, the new API prototype ended up being:

+#define AV_NUM_DATA_POINTERS 8
+
+/**
+ * This structure describes decoded subtitle rectangle
+ */
+typedef struct AVFrameSubtitleRectangle {
+    int x, y;
+    int w, h;
+
+    /* image data for bitmap subtitles, in AVFrame.format (AVPixelFormat) */
+    uint8_t *data[AV_NUM_DATA_POINTERS];
+    int linesize[AV_NUM_DATA_POINTERS];
+
+    /* decoded text for text subtitles, in ASS */
+    char *text;
+
+    int flags;
+} AVFrameSubtitleRectangle;
+

But then, do we use a fixed pixel format for all codecs? Is this really
enough when some subtitles are actually a bunch of image files inside a
"modern standard container"? (before you ask, yeah I saw that a few years
back in some broadcasting garbage thing).

What about PAL8 subtitles? We currently need to convert them into codecs,
and re-analyzed them again during encoding to reconstitute the palette,
and it creates unnecessary complexity.

Also, how do you deal with the allocation of such a thing when you don't
actually know the number of rectangles in advance? (I addressed that, but
it's not "pretty").

See also the start/end display time, which needs an arbitrary timebase.

[...]

Anyway, I could arbitrate on all these decisions, but it's more than
exhausting, especially thinking that since there was no real prior
discussion/consensus on these topics, it would bring a lot of design
disagreement during review. Problem is: most people don't care unless they
see the patchset, or clearly see the big picture to be willing to discuss
a solution at length.

I'm glad people are finally getting interest in the subtitles. Better late
than never. So my suggestion so far to address the issue would be:

1. $SOMEONE adds some tests for annoying subtitle formats (I'd say
   dvbsub dec + enc, libzvbi-teletext) with decent coverage
2. I rebase my old branch
3. I come up with a more specific list of technical topics to discuss on
   how to address the current limitations and problems with future API
4. $SOMEONE organizes a meeting so we can take decisions on all these
   topics
5. I can start working on this again (maybe with some new folks but I
   haven't much hope)

Regards,

-- 
Clément B.