[FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n for subtitle handling

Soft Works softworkz at hotmail.com
Sat Dec 11 20:03:39 EET 2021



> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Michael
> Niedermayer
> Sent: Saturday, December 11, 2021 6:21 PM
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n
> for subtitle handling
> 
> On Fri, Dec 10, 2021 at 03:02:32PM +0000, Soft Works wrote:
> >
> >
> > > -----Original Message-----
> > > From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> > > Cantarín
> > > Sent: Thursday, December 9, 2021 10:33 PM
> > > To: ffmpeg-devel at ffmpeg.org
> > > Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare
> AVFrame\n
> > > for subtitle handling
> > >
> > > Hi there.
> > > This is my first message to this list, so please excuse me if I
> > > unintendedly break some rule.
> > >
> > > I've read the debate between Soft Works and others, and would like to
> > > add something to it.
> > > I don't have a deep knowledge of the libs as other people here show. My
> > > knowledge comes from working with live streams for some years now. And I
> > > do understand the issue about modifying a public API for some use case
> > > under debate: I believe it's a legit line of questioning to Soft Works
> > > patches. However, I also feel we live streaming people are often let
> > > aside as "border case" when it comes to ffmpeg/libav usage, and this
> > > bias is present in many subtitles/captions debates.
> > >
> > > I work with Digital TV signals as input, and several different target
> > > outputs more related to live streaming (mobiles, PCs, and so on). The
> > > target location is Latin America, and thus I need subtitles/captions for
> > > when we use english spoken audio (we speak mostly Spanish in LATAM). TV
> > > people send you TV subtitle formats: scte-27, dvb subs, and so on. And
> > > live streaming people uses other subtitles formats, mostly vtt and ttml.
> > > I've found that CEA-608 captions are the most compatible caption format,
> > > as it's understood natively by smart tvs and other devices, as well as
> > > non-natively by any other device using popular player-side libraries.
> > > So, I've made my own filter for generating CEA-608 captions for live
> > > streams, using ffmpeg with the previously available OCR filter. Tried
> > > VTT first, but it was problematic for live-streaming packaging, and with
> > > CEA-608 I could just ignore that part of the process.
> > >
> > > While doing those filters, besides the whole deal of implementing the
> > > conversion from text to CEA-608, I struggled with stuff like this:
> > > - the sparseness of input subtitles, leading to OOM in servers and
> > > stalled players.
> > > - the "libavfilter doesn't take subtitle frames" and "it's all ASS
> > > internally" issues.
> > > - the "captions timings vs video frame timings vs audio timings"
> > > problems (people talk a lot about syncing subs with video frames, but
> > > rarely against actual dialogue audio).
> > > - other (meta)data problems, like screen positioning or text encoding.
> > >
> > > This are all problems Soft Works seems to have faced as well.
> > >
> > > But of all the problems regarding live streaming subtitles with ffmpeg
> > > (and there are LOTS of it), the most annoying problem is always this:
> > > almost every time someone talked about implementing subtitles in filters
> > > (in mail lists, in tickets, in other places like stack overflow,
> > > etcetera), they always asumed input files. When the people specifically
> > > talked about live streams, their peers always reasoned with files
> > > mindset, and stated live streaming subtitles/captions as "border case".
> > >
> > > Let me be clear: this are not "border case" issues, but actually appear
> > > in the most common use cases of live streaming transcoding. They all
> > > appear *inmediatelly* when you try to use subtitles/captions in live
> > > streams.
> > >
> > > I got here (I mean this thread) while looking for ways to fixing some
> > > issues in my setup. I was reconsidering VTT/TTML generation instead of
> > > CEA-608 (as rendering behave significantly different from device to
> > > device), and thus I was about to generate subtitle type output from some
> > > filter, was about to create my own standalone "heartbeat" filter to
> > > normalize the sparseness, and so on and so on: again, all stuff Soft
> > > Works seems to be handling as well. So I was quite happy to find someone
> > > working on this again; last time I've seen it in ffmpeg's
> > > mailing/patchwork
> > > (https://patchwork.ffmpeg.org/project/ffmpeg/patch/20161102220934.26010-
> 1-
> > > u at pkh.me)
> > > the code there seemed to die, and I was already late to say anything
> > > about it. However, reading the other devs reaction to Soft Works work
> > > was worrying, as it felt as history wanted to repeat itself (take a look
> > > at discussions back then).
> > >
> > > It has been years so far of this situation. This time I wanted to
> > > annotate this, as this conversation is still warm, in order to help Soft
> > > Works's code survive. So, dear devs: I love and respect your work, and
> > > your opinion is very important to me. I do not claim to know better than
> > > you do ffmpeg's code. I do not claim to know better what to do with
> > > libavfilter's API. Please understand: I'm not here to be right, but to
> > > note my point of view. I'm not better than you; quite on the contrary
> > > most likely. But I also need to solve some very real problems, and can't
> > > wait until everything else is in wonderful shape to do it. I can't also
> > > add lots of conditions in order to just fix the most immediate issues;
> > > like it's the case with sparseness and heartbeat frames, which was a
> > > heated debate years ago and seems to still be one, while I find it to be
> > > the most obvious common sense backwards-compatible solution
> > > implementation. Stuff like "clean" or "well designed" can't be more
> > > important than actually working use cases while not breaking previously
> > > implemented ones: because it's far easier to fix little blocks of "bad"
> > > code rather than design something everybody's happy with (and history of
> > > the project seems to be quite eloquent about that, specially when it
> > > comes to this particular use cases). Also, I have my own patches (which
> > > I would like to upstream some day), and I can tell the API do change
> > > quite normally: I understand that should be a curated process, but
> > > adding a single property for live-streaming subtitles isn't also
> > > anybody's death, and thus that shouldn't be the kind of issues that
> > > blocks big and important code implementations like the ones Soft Works
> > > is working on; I just don't have the time to do myself all that work
> > > he/she's doing, and it could be another bunch of years until someone
> > > else have it. I can't tell if Soft Works code is well enough for you, or
> > > if the ideas behind it are the best there are, but I can tell you the
> > > implementations are in the right track: as a live streaming worker, I
> > > know the problems he/she mentions in their exchanges with you all, and I
> > > can tell you they're all blocking issues when dealing with live
> > > streaming. Soft Work is not "forcing it" into the API, and this are not
> > > "border cases" but normal and frequent live streaming issues. So,
> > > please, if you don't have the time Soft Works have, or the will to
> > > tackle the issues he/she's tackling, I beg you at least don't kill the
> > > code this time if it does not breaks working use cases.
> > >
> > >
> > > Thanks,
> > > Daniel.
> >
> > Hi Daniel,
> >
> > thanks a lot for your kind words. I'm a "He-Man", and if I could turn
> > back time, I would have used my real name. Yet I started off as softworkz
> > and I can't change anymore without compromising the pseudonym.
> >
> > As you have realized, the ML can be a pool of sharks at time,
> > everybody following different motivations, sometimes personal, sometimes
> > commercial, you'll hardly ever know. From my side, I have benefitted
> > a lot from ffmpeg and it has always been a plan to contribute something
> > in return, with the subtitles subject finally being chosen.
> > The conclusion is that I have spent more time on ML interaction than
> > on the development itself, so it hasn't really been an economically
> > effective kind of work load.
> > Nonetheless, I have patiently applied all requested changes going over
> > many iterations so far.
> > From the remaining change requests, there's a major one that I'm rejecting
> > to change (duality of frame.pts and frame.subtitle_pts field), and I don't
> > know whether I haven't explained the requirement for the duality of those
> > sufficiently well, or whether it wasn't attempted to be understood and
> > just blindly objected as being a "gray" spot regarding the frame API.
> > The duality doesn't serve just edge cases, it is an important element
> > of the heartbeat mechanisms for dealing with sparse subtitles and also
> > important to retain muxing offsets (often subtitles are muxed a few
> > seconds ahead of time).
> 
> > The other point that I'm rejecting to change are the time bases of the
> > involved fields. I have projected the existing subtitles functionality
> > to the new API in a direct and transparent way, to achieve a high
> > level of compatibility and stability for the transition.
> > Being able to use the result as an instant replacement in production
> > scenarios is a top-level requirement from my side and I cannot take
> > the risk of needing to fix regressions all over the place which
> > would be introduced by a change like making those fields adhering
> > to a non-constant time-base.
> 
> This sounds a bit like you expect that the majority of cases to not
> change ? iam asking because
> most cases i tried do change with the part of the patchset which
> cleanly applies. In fact about half of the changes are the failure i already
> posted previously. I think you said its an issue elsewhere. Still that needs
> to be fixed before this patchset can be used as a
> "instant replacement in production scenarios"

You had posted two cases that were failing. 

1. > ./ffmpeg -i ~/tickets/1332/Starship_Troopers.vob -scodec xsub -qscale 2 -an
> file1332.avi

==> Fixed since V18


2. > This breaks:
> ./ffmpeg -i ~/tickets/153/bbc_small.ts -filter_complex '[0:v][0:s]overlay' -
> qscale 2 -t 3 -y file.avi

==> It wasn't actually a regression. It was a bug in dvbdubdec that just got
covered up earlier by some sub2video hacks.

I have submitted this fix for the error:

https://patchwork.ffmpeg.org/project/ffmpeg/patch/DM8P223MB03655DEE6FF0228743117178BA6A9@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM/


> Also if you want more testcases which fail the same way as the previously
> posted one or testcases which produce different output, just say so

Yes please! Let's just avoid duplicates of the dvdsub bug (unless
the fix is applied).

I'll reply to the bottom part separately.

Thanks,
softworkz


More information about the ffmpeg-devel mailing list