[FFmpeg-devel] Status and Plans for Subtitle Filters

Fri Feb 28 10:12:51 EET 2020

On Fri, Feb 28, 2020 at 05:55:19AM +0100, Anton Khirnov wrote:
> Quoting Clément Bœsch (2020-02-27 19:36:24)
> > On Thu, Feb 27, 2020 at 12:35:03PM +0100, Anton Khirnov wrote:
> > [...]
> > > AFAIU one of the still-open questions for the subtitle redesign is what
> > > does it mean to decode or encode a subtitle.
> > 
> > There are multiple markups available for text subtitles, and there are
> > multiple ways of representing graphic rectangles for bitmap subtitles.
> > 
> > So for text subtitles, decoding and encoding respectively means
> > transforming them in a common markup to represent them all (currently ASS
> > is our "raw" representation) and back into their markup specifications. We
> > have a bunch of those already (subrip, microdvd, subviewer, ...).
> 
> Is it still true that ASS is a superset of everything? Is that likely to
> remain the case for the foreseeable future?
> 

Nah, it isn't, and actually never really was. It was just the best we had
at that time, and I believe it's still the best. The libass implementation
plays a huge role in having ASS the de-facto "standard" for subtitles
markup. It was discussed in the past the ability to represent raw as an
AST, to allow custom renderers. We could consider such a thing in the
future, but I'd bet that most users will convert that AST back in to ASS
to send it to the libass renderer and not bother with it.

I'm definitely not opposed to consider alternate representations for raw
text markup, but currently I wouldn't consider that a real limitation. And
for once, this is not something I'd consider blocking in this refactor.

I'm not an ASS expert, but the two limitations I'm aware of are the timing
precision (but that's at format level, not markup), and the lack of
furigana builtin: webvtt typically has those (look for <ruby>). There
might be others.

> > For bitmap subtitles, decoding and encoding respectively means
> > transforming the bitstream into rectangle structures with RAW images
> > inside and back into the codec-specific bitstream.
> > 
> > > And one of the options is putting the AVPacket->"decoded subtitle"
> > > (whatever that is) and "decoded subtitle"->AVPacket conversions into a
> > > separate library.
> > 
> > And then you can't have them in libavfilter, so you can't have a sane
> > harmony with medias including subtitle streams. It's problematic with many
> > basic use cases. One random example: if you're transcoding an audio/video
> > and somehow altering the timings within lavfi, you have to give the
> > subtitles.
> 
> I don't see why that necessarily follows from not using AVFrame.
> avfilter does not have to be tied to only using AVFrame forever for all
> eternity. It could have a different path for subtitles. Their handling
> is going to be pretty different in any case.

All the filter API and builtin are designed around AVFrame, pretty sure
this would cause a huge mess of duplication and nuisance for the users and
developers willing to abstract away that complexity.

> Note that I'm not saying it SHOULD be done this way. I'm saying that it
> seems like an option that should not be disregarded without
> consideration.

Of course; and it won't surprise you if I said it was considered and
discussed in the past already (OK it was about 10 years ago now).

-- 
Clément B.