[FFmpeg-devel] Status and Plans for Subtitle Filters

Michael Niedermayer michaelni at gmx.at
Wed Feb 26 01:21:56 EET 2020


On Mon, Feb 24, 2020 at 08:48:23PM +0100, Nicolas George wrote:
> Michael Niedermayer (12020-02-24):
> > > No, they can't: being the same subtitle or not is part of the semantic.
> 
> > Does anyone else share this oppinion ?
> > 
> > iam asking because we need to resolve such differences of oppinion to
> > move forward.
> > Theres no way to design an API if such relativly fundamental things
> > have disagreements on them
> 
> It's not a matter of opinion, it is actually quite obvious:
> 
> # 1
> # 00:00:10,000 --> 00:00:11,000
> # Hello.
> # 
> # 2
> # 00:00:11,000 --> 00:00:12,000
> # Hello.
> 
> … means that two people said Hello in quick succession while:
> 
> # 1
> # 00:00:10,000 --> 00:00:12,000
> # Hello.
> 
> … means that Hello was said only once, slowly.

Yes
but the overlap is neither solving that nor sufficient
nor does this work very well

it doesnt work very well because when someone speaks really fast
you display the text only for a short time and noone can read it.
that fails to achive the main goal of a subtitle of allowing
someone to read it.
one can go on now to list cases where this is ambigous or not
enough.

But i think a better summary is that there are 2 really seperate things
1. The actual content 
2. The way it is presented. (loud, fast fearfull, whatever)

I think we should not in our internal representation use the duration
of display for the duration of sound.
Especially formats with strict random access points will always start
all subtitles at that point anew. Otherwise one could not seek to
that point. and that will produce subtitles where the duration
interpretation as sound duration would not work well


> 
> And it has practical consequences: Clément suggested a voice synthesis
> filter, that would change its output.
> 
> Some subtitles have overlap all over the place. I am thinking in
> particular of some animé fansub, with on-screen signs and onomatopoeia
> translated and cultural notes, all along with dialogue. De-overlapping
> would increase their size considerably, and cause actual dialogue to be
> split, which results in the problems I have explained above.

i think you mix things up

subtitle size matters in the muxed format, this is talking about the
representation in AVFrames. This would make no difference to what is
stored, in fact the encoder searching for things it can merge instead
of not doing that could lead to smaller files.

Also for the subtitle rectangles we could even use reference counting
and reuse them as long as they did not change.



> 
> But I don't know why you are so focussed on this. Overlapping is not a

Its not a focus at all, just was something i noticed when reading this
which IMHO can be avoided to make the API maybe simpler

Its a suggestion nothing else


> problem, it's just something to keep in mind while designing the API,
> like the fact that bitmap subtitles have several rectangles. It's
> actually quite easy to handle.

Iam not sure arbitrary overlapping AVFrames will not cause problems,
its very different from existing semantics

Thanks

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20200226/2bd16511/attachment.sig>


More information about the ffmpeg-devel mailing list