[FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n for subtitle handling

Thu Dec 9 23:33:19 EET 2021

Hi there.
This is my first message to this list, so please excuse me if I 
unintendedly break some rule.

I've read the debate between Soft Works and others, and would like to 
add something to it.
I don't have a deep knowledge of the libs as other people here show. My 
knowledge comes from working with live streams for some years now. And I 
do understand the issue about modifying a public API for some use case 
under debate: I believe it's a legit line of questioning to Soft Works 
patches. However, I also feel we live streaming people are often let 
aside as "border case" when it comes to ffmpeg/libav usage, and this 
bias is present in many subtitles/captions debates.

I work with Digital TV signals as input, and several different target 
outputs more related to live streaming (mobiles, PCs, and so on). The 
target location is Latin America, and thus I need subtitles/captions for 
when we use english spoken audio (we speak mostly Spanish in LATAM). TV 
people send you TV subtitle formats: scte-27, dvb subs, and so on. And 
live streaming people uses other subtitles formats, mostly vtt and ttml. 
I've found that CEA-608 captions are the most compatible caption format, 
as it's understood natively by smart tvs and other devices, as well as 
non-natively by any other device using popular player-side libraries. 
So, I've made my own filter for generating CEA-608 captions for live 
streams, using ffmpeg with the previously available OCR filter. Tried 
VTT first, but it was problematic for live-streaming packaging, and with 
CEA-608 I could just ignore that part of the process.

While doing those filters, besides the whole deal of implementing the 
conversion from text to CEA-608, I struggled with stuff like this:
- the sparseness of input subtitles, leading to OOM in servers and 
stalled players.
- the "libavfilter doesn't take subtitle frames" and "it's all ASS 
internally" issues.
- the "captions timings vs video frame timings vs audio timings" 
problems (people talk a lot about syncing subs with video frames, but 
rarely against actual dialogue audio).
- other (meta)data problems, like screen positioning or text encoding.

This are all problems Soft Works seems to have faced as well.

But of all the problems regarding live streaming subtitles with ffmpeg 
(and there are LOTS of it), the most annoying problem is always this: 
almost every time someone talked about implementing subtitles in filters 
(in mail lists, in tickets, in other places like stack overflow, 
etcetera), they always asumed input files. When the people specifically 
talked about live streams, their peers always reasoned with files 
mindset, and stated live streaming subtitles/captions as "border case".

Let me be clear: this are not "border case" issues, but actually appear 
in the most common use cases of live streaming transcoding. They all 
appear *inmediatelly* when you try to use subtitles/captions in live 
streams.

I got here (I mean this thread) while looking for ways to fixing some 
issues in my setup. I was reconsidering VTT/TTML generation instead of 
CEA-608 (as rendering behave significantly different from device to 
device), and thus I was about to generate subtitle type output from some 
filter, was about to create my own standalone "heartbeat" filter to 
normalize the sparseness, and so on and so on: again, all stuff Soft 
Works seems to be handling as well. So I was quite happy to find someone 
working on this again; last time I've seen it in ffmpeg's 
mailing/patchwork 
(https://patchwork.ffmpeg.org/project/ffmpeg/patch/20161102220934.26010-1-u@pkh.me) 
the code there seemed to die, and I was already late to say anything 
about it. However, reading the other devs reaction to Soft Works work 
was worrying, as it felt as history wanted to repeat itself (take a look 
at discussions back then).

It has been years so far of this situation. This time I wanted to 
annotate this, as this conversation is still warm, in order to help Soft 
Works's code survive. So, dear devs: I love and respect your work, and 
your opinion is very important to me. I do not claim to know better than 
you do ffmpeg's code. I do not claim to know better what to do with 
libavfilter's API. Please understand: I'm not here to be right, but to 
note my point of view. I'm not better than you; quite on the contrary 
most likely. But I also need to solve some very real problems, and can't 
wait until everything else is in wonderful shape to do it. I can't also 
add lots of conditions in order to just fix the most immediate issues; 
like it's the case with sparseness and heartbeat frames, which was a 
heated debate years ago and seems to still be one, while I find it to be 
the most obvious common sense backwards-compatible solution 
implementation. Stuff like "clean" or "well designed" can't be more 
important than actually working use cases while not breaking previously 
implemented ones: because it's far easier to fix little blocks of "bad" 
code rather than design something everybody's happy with (and history of 
the project seems to be quite eloquent about that, specially when it 
comes to this particular use cases). Also, I have my own patches (which 
I would like to upstream some day), and I can tell the API do change 
quite normally: I understand that should be a curated process, but 
adding a single property for live-streaming subtitles isn't also 
anybody's death, and thus that shouldn't be the kind of issues that 
blocks big and important code implementations like the ones Soft Works 
is working on; I just don't have the time to do myself all that work 
he/she's doing, and it could be another bunch of years until someone 
else have it. I can't tell if Soft Works code is well enough for you, or 
if the ideas behind it are the best there are, but I can tell you the 
implementations are in the right track: as a live streaming worker, I 
know the problems he/she mentions in their exchanges with you all, and I 
can tell you they're all blocking issues when dealing with live 
streaming. Soft Work is not "forcing it" into the API, and this are not 
"border cases" but normal and frequent live streaming issues. So, 
please, if you don't have the time Soft Works have, or the will to 
tackle the issues he/she's tackling, I beg you at least don't kill the 
code this time if it does not breaks working use cases.

Thanks,
Daniel.