[FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n for subtitle handling
Soft Works
softworkz at hotmail.com
Sun Dec 12 04:21:42 EET 2021
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Daniel
> Cantarín
> Sent: Sunday, December 12, 2021 12:39 AM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH v20 02/20] avutil/frame: Prepare AVFrame\n
> for subtitle handling
>
> > One of the important points to understand is that - in case of subtitles,
> > the AVFrame IS NOT the subtitle event. The subtitle event is actually
> > a different and separate entity. (...)
>
>
> Wouldn't it qualify then as a different abstraction?
>
> I mean: instead of avframe.subtitle_property, perhaps something in the
> lines of avframe.some_property_used_for_linked_abstractions, which in
> turn lets you access some proper Subtitle abstraction instance.
>
> That way, devs would not need to defend AVFrame, and Subtitle could
> have whatever properties needed.
>
> I see there's AVSubtitle, as you mention:
> https://ffmpeg.org/doxygen/trunk/structAVSubtitle.html
>
> Isn't it less socially problematic to just link an instance of AVSubtitle,
> instead of adding a subtitle timing property to AVFrame?
> IIUC, that AVSubtitle instance could live in filter context, and be linked
> by the filter doing the heartbeat frames.
>
> Please note I'm not saying the property is wrong, or even that I understand
> the best way to deal with it, but that I recognize some social problem here.
> Devs don't like that property, that's a fact. And technical or not, seems to
> be a problem.
>
> > (...)
> > The chairs are obviously AVFrames. They need to be numbered monotonically
> > increasing - that's the frame.pts. without increasing numbering the
> > transport would get stuck. We are filling the chairs with copies
> > of the most recent subtitle event, so an AVSubtitle could be repeated
> > like for example 5 times. It's always the exact same AVSubtitle event
> > sitting in those 5 chairs. The subtitle event has always the same
> start time
> > (subtitle_pts) but each frame has a different pts.
>
> I can see AVSubtitle has a "start_display_time" property, as well as a
> "pts" property "in AV_TIME_BASE":
>
> https://ffmpeg.org/doxygen/trunk/structAVSubtitle.html#af7cc390bba4f9d6c32e39
> 1ca59d117a2
>
> Is it too much trouble to reuse that while persisting an AVSubtitle instance
> in filter context? I guess it could even be used in decoder context.
>
> I also see a quirky property in AVFrame: "best_effort_timestamp"
> https://ffmpeg.org/doxygen/trunk/structAVFrame.html#a0943e85eb624c2191490862e
> cecd319d
> Perhaps adding there some added "various heuristics" that it claims to
> have,
> this time related to a linked AVSubtitle, so an extra property is not
> needed?
>
>
> > (...)
> > Considering the relation between AVFrame and subtitle event as laid out
> > above, it should be apparent that there's no guarantee for a certain
> > kind of relation between the subtitle_pts and the frame's pts who
> > is carrying it. Such relation _can_ exist, but doesn't necessarily.
> > It can easily be possible that the frame pts is just increased by 1
> > on subsequent frames. The time_base may change from filter to filter
> > and can be oriented on the transport of the subtitle events which
> > might have nothing to do with the subtitle display time at all.
>
> This confuses me.
> I understand the difference between filler frame pts and subtitle pts.
> That's ok.
> But if transport timebase changes, I understand subtitle pts also changes.
>
> I mean: "transport timebase" means "video timebase", and if subs are synced
> to video, then that sync needs to be mantained. If subs are synced, then
> their timing is never independant. And if they're not synced, then its
> AVFrame
> is independant from video frames, and thus does not need any extra prop.
>
> Here's what I do right now with the filler frames. I'm talking about current
> ffmpeg with no subs frames in lavfi, and real-time conversion from dvbsub
> to WEBVTT using OCR. Quite dirty stuff what I do:
> - Change FPS to a low value, let's say 1.
> - Apply OCR to dvb sub, using vf_ocr.
> - Read the metadata downstream, and writting vtt to file or pipe output.
>
> As there's no sub frame capability in lavfi, I can't use vtt encoder
> downstream.
> Therefore, the output is raw C string and file manipulation. And given
> that I
> set first the FPS to 1, I have 1 line per second, no matter the
> timestamp of
> either subs or video or filler frame. The point then is to check for
> text diffs
> instead of pts for detecting the frame nature. And I can even naively
> just put
> the frame's pts once per sec with the same text, and with empty lines when
> there's no text, without caring about the frame nature (filler or not).
>
> There's a similar behaviour when dealing with CEA-608: I need to check text
> differences instead of any pts, as inner workings of this captions are more
> related to video than subs. I assume in my filters that frame PTS is
> correct.
>
> I understand the idea behind PTS, I get that there's also DTS, and so I
> can get
> that there could be an use case where another timing is needed. But I still
> don't see the need for this particular extra timing, as the distance
> between
> subtitle_pts and filler.pts does not means downstream something like "now
> clear the current subtitle line" or something like that. What will
> happen if
> there's no subtitle_pts, is that the same line will still be active,
> which will
> only change when there's an actual subtitle difference. So, I believe this
> value is more theoretically useful rather than factual.
>
> I understand that there are subs formats that need precise start and end
> timing, but I fail to see the case where that timing avoids the need for
> text
> differences checking, be it filter or encoder. And if filters or
> encoders naively
> use PTS, then the filler frames would not break anything: will show
> repeatedly
> the same text line, at current FPS speed. And if the sparseness problem is
> finally solved by your logic somehow, and there's no need for filler
> frames,
> then there's also no need for subtitle_pts, as pts would be actually fine.
>
> So, I'm confused, given that you state this property as very important.
> Would you please tell us some actual, non-theoretical use case for the prop?
>
>
> >
> > Also, subtitle events are sometimes duplicated. When we would convert
> > the subtitle_pts to the time_base that is negotiated between two filters,
> > then it could happen that multiple copies of a single subtitle event have
> > different subtitle_pts values.
> >
>
> If it's repeated, doesn't it have different pts?
> I get repeated lines from time to time. But they have slightly different
> PTS.
>
> "Repeated event" != "same event".
> If you check for repeated events, then you're doing some extra checking,
> as I point with "text difference checks" in previous paragraphs, and so
> PTS is not ruling all the logic. Otherwise, worst case scenario you get the
> same PTS twice, which will discard some frame. And most likely scenario,
> you get two identical frames with different PTS, that actually changes
> nothing in viewer's experience.
>
> >
> > Besides that, there are practical considerations: The subtitle_pts
> > is almost nowhere needed in any other time_base than AV_TIMEBASE_Q.
> >
> > All decoders expect it to be like this, all encoders and all filters.
> > Conversion would need to happen all over the place.
> > Every filter would need to take care of rescaling the subtitle_pts
> > value (when time_base is different between in and out).
> >
>
> I'm not well versed enough in ffmpeg/libav to understand that.
> But I tell you what. You think is possible for you to do some practical
> test?
> I mean this:
> - Take some short video example with dvbsubs (or whatever graphical).
> - Apply graphicsub2text, converting to webvtt, srt, or something.
> - Do the same, but taking away subtitle_pts from AVFrame.
>
> Let's compare both text outputs.
> I propose text, because is easier to share. But if you think of any other
> practical example like this, it's also welcome. The point is to understand
> the relevance of subtitle_pts by looking at the problem of not having it.
>
> If there's no big deal, then screw it: you take it away, devs get pleased,
> and everybody in the world gets the blessing of having subtitle frames in
> lavfi. If there's some big deal, then the devs should understand.
I'm afraid, the only reply that I have to this is:
- Take my patchset
- Remove subtitle_pts
- Get everything working
(all example command lines in filters.texi)
=> THEN start talking
The same goes out to everybody else who keeps telling it can be
removed and that it's an unnecessary duplication.
The stage is yours...
Kind regards,
softworkz
More information about the ffmpeg-devel
mailing list