[FFmpeg-devel] [PATCH v5 00/12] Subtitle Filtering

Soft Works softworkz at hotmail.com
Thu Sep 16 20:46:01 EEST 2021



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Nicolas George
> Sent: Thursday, 16 September 2021 12:20
> To: FFmpeg development discussions and patches <ffmpeg-
> devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v5 00/12] Subtitle Filtering
> 

> There is another point to consider when designing subtitles in
> AVFrame:
> since we intend it not only as a general API cleanup but as a prelude
> to
> extending the API with filtering and such, we must not only think
> about
> what is needed now, we must think about what may be needed later.
> 
> The few examples I have, not excluding questions I have not thought
> of:
> 
> - Right now, we have text subtitles and bitmap subtitles. But do we
> want
>   to be able to handle mixed text and bitmap subtitles?
> 
>   To this, I would say: probably no. But we should give it a thought.

Each rect has a field of type enum AVSubtitleType, so this remains to
be possible, but I agree that it's unlikely to be required.

> - Number of rectangles. Currently, the decoders usually output a very
>   limited number of rectangles. OTOH, libass may output three alpha
> maps
>   per glyph, that makes potentially hundreds of rectangles.
> 
>   Will the data structure be able to handle this efficiently?

You are right about the amount of temporary images that libass
is creating (can go up to 100k for complex subs with animation).
But I don't think that there's any case where the data would 
leave a filter that way.

I have already:

- overlay_graphicsubs: The bitmaps are blended on a video 
  (2 inputs: video, subs[format=bitmap], 1 output: video)
- overlay_graphicsubs: The bitmaps are blended on a new
  empty transparent frame 
  (1 input, subs[format=bitmap], 1 output: video[with alpha])   

What might make sense as well, would be a conversion from 
text subs to graphic subs, but this can't be done by having 
a separate bitmap for each glyph which is most likely off-spec
for any graphic subtitle format.
So, even in that case, the individual libass rects won't leave
the filter as subtitle rects.

>   Consider also the issue of rectangles overlapping.
> 
> - Speaking of overlapping, we need a system to signal whether a new
>   subtitle frame should replace the current one (like dvdsub, srt,
> etc.)
>   or overlap with it (like ASS).

I don't think anything new is required as the existing logic is
preserved, no matter whether the subtitle frames drive through 
a filter chain or are forwarded to encoding directly.


> - Colorspace. All current bitmap subtitles formats are paletted. But
>   palette pixel formats are bad for many treatments. Inside a filter
>   chain, it would probably make sense to have them in a true color
>   format.

There are cases where this makes sense, for example to feed them
for overlay into a hardware filtering chain (via hwupload).

For these cases, I have graphicsub2video and textsub2video which
output transparent RGBA images.


For local (non-hardware) overlay it doesn't make sense to convert 
the palette bitmaps. 
For example, in overlay_graphicsubs, when overlaying over a yuv 
format, I'm just converting the palette to yuv for blending.

> - Global styles. ASS subtitles, in particular, contain reference to
>   globally-defined styles. How do we handle that in libavfilter? I
> have
>   no idea.

There's a shared reference pointing at the global ass header (string)
which each subtitle AVFrame is carrying.
If that is not available, ass_get_default_header (iirc) can be called.


> - Sparseness. Subtitles streams have gaps, and synchronization with
>   other streams requires a next frame, that can be minutes away or
> never
>   come. This needs to be solved in a way compatible with processing.

I have kept the heartbeat logic from your sub2video implementation.
It makes sense, is required and I can't think of any better way to
handle this. It's just renamed to subtitle_heartbeat and used for 
all subtitle formats.


> > - Part3, avfilter support for subtitles in AVFrames. At this point
> we
> > have a defined structure to store subtitles in AVFrames, and actual
> > code that can generate or consume them. When approaching this, the
> > same rules apply as before, existing subtitle functionality, as
> crude
> > as it may be, has to remain functional as exposed to the user.

Check.

> We need to decide which aspects of the subtitles formats are
> negotiated.
> 
> At least, obviously, the text or bitmap aspect will be, with a
> conversion filter inserted automatically where needed.

I'm inserting the graphicsub2video filter for keeping compatibility
with sub2video command lines, but I'm not a fan of any other 
automatic filter insertion. 
Let's talk about this in a separate conversation. 


> But depending on the answers to the questions in part 1, we may need
> to
> negotiate the pixel format and colorspace too.

At current, bitmap subtitles are always PAL8 and that should be
(remain to be) the meaning if SUBTITLE_BITMAP. 

It will be easy to add additional subtitle formats which could 
use a different kind of bitmap format.


> Unfortunately, the current negotiation code is messy and fragile. We
> cannot afford to pile new code on top of it. However good the new
> code
> may be, adding it on top of messy code would only make it harder to
> clean up and maintain later. I absolutely oppose that.

The situation may be messy for audio and video, but for subtitles
format negotiation is really simple (SUBTITLE_BITMAP, SUBTITLE_ASS
or SUBTITLE_TEXT).

The improvements you are planning can easily be done afterwards 
as the subtitle format negotiation really doesn't add any significant
technical debt to the situation.


Kind regards,
softworkz


More information about the ffmpeg-devel mailing list