[FFmpeg-devel] [RFC] Subtitle Filtering Ramp-Up

Wed Jun 4 04:16:52 EEST 2025

Hi,

for the sake of also having given a technical answer:

> You say that, but I don't see that at all. In 3 of your 4 cases, the
> two sets of fields seem to be close to identical with no good reason
> to be separate

Let's go through it once again:

1. AV_SUBTITLE_FLOW_NORMAL

"close to identical" is not the same as identical. A deeper explanation
for the first case is given here: 
https://github.com/softworkz/SubtitleFilteringDemos/issues/1 

For the duration: There is quite a range of functionality built around
AVFrame's duration field. Yet, that code makes the assumption that 
once the duration has elapsed, the next frame will arrive.
This is true for Video and Audio, but not for subtitles.

In case of subtitles, the duration might be 3s (display time), but it
can happen that there doesn't follow any other subtitle frame even for an
hour or more.
The duration of an audio or video frame and the (presentation) duration  
of a subtitle have fundamentally different semantics and there is 
existing functionality that is treating the duration as audio/video 
duration - and that's why we cannot put our subtitle duration into 
that field.
The frame duration must not be set, because we don't know when the 
next frame arrives. But we set the subtitle duration because we 
know how long it should be displayed. There can come another frame,
even before the display duration of the previous has elapsed. Would 
we have set the subtitle duration as the frame duration, then it 
would have been wrong because the subsequent frame came much 
earlier than the frame duration.

2. AV_SUBTITLE_FLOW_KILLPREVIOUS

In this case, subtitle start-time and frame time are equal, but the sub-duration
is infinite (unknown). But setting the duration to infinite when it's 
unknown is subtitle semantic - but not AVFrame semantic. If we would 
set the frame duration to infinite, unexpected things can happen because
it's not meant to be set to infinity.

Also, in this flow mode, an active subtitle event becomes hidden by a 
subsequent "empty" event. Those "reset" events have a duration of 0.
Zero is once again a number with often special meaning and we should 
not put it into the frame duration field.

3. AV_SUBTITLE_FLOW_HEARTBEAT

It's not clear to me how you can conclude that 
"In 3 of your 4 cases, the two sets of fields seem to be close to 
identical with no good reason to be separate"
All 4 fields have very different values in this case.

>>>>>>>>>>> THIS IS IMPORTANT <<<<<<<<<<<

> In fact, this is the main problem that plagued this patchset from the
> start. The newly introduced public API is designed around
> ffmpeg.c/lavfi implementation details, rather than cleanly
> representing subtitle data and then adjusting the implementation to
> actually support it.

Okay, great. We are hitting an essential point here. Just that there's
no plague - this is done by full intention, and here's why:

If we would only ever have subtitle filters handling subtitle frames
in a special subtitle filter-graph, then most of what you are suggesting
would work out indeed. In this case, it would also suffice to use
just the existing frame timing fields (well, in most cases at least).

But what kind of goal is that? I wouldn't have moved even by a millimeter
for this. The primary point which makes this so attractive and interesting
is the ability to interop between different media types, like subtitles 
and video or subtitles and audio, or splitting out closed captions 
from a video stream and many more possible features that don't exist yet.

Subtitles are different to video and audio in multiple points - a primary
one is that subtitles are sparse while audio and video are not.

When you want to enable interaction between subtitles and audio/video,
it wouldn't work out when each side would play by its own rules.
Nothing would go together in that case.
The only way how this can work properly (even more important: generically,
without needing special handling everywhere) - is to have one side 
play by the rules of the other side. Since video and audio filtering 
is long-time implemented and tested, it wouldn't make sense change 
video or audio filtering. Naturally, it needs to go the other way round:

Subtitles need to play by the rules of video and audio with regards to
filtering and all around. That's the way to achieve almost everything 
you can think of. And exactly that's what I've done.

And that's also what FFmpeg users want. Not a single one of them will
ever care about whether there are those 2 extra fields.
And if they would know what range of features those two fields are 
enabling and that there are two or three developers making a big drama
out of it whether these should be there or not, they would be throwing
tomatoes right now.

The result might not be a spotless beauty, but I've also seen so many
much more worse things, and when you consider all the things that this
little spot of imperfection enables, then it's still a very small
sacrifice.

Best regards
sw