[FFmpeg-devel] [PATCH v4 6/6] decklink_enc: add support for playout of 608 captions in MOV files

Tue May 2 17:47:52 EEST 2023

Hi Lance,

On Sun, Apr 30, 2023 at 7:01 PM Lance Wang <lance.lmwang at gmail.com> wrote:
> This implementation  is limited to decklink SDI output only,  If possible,
> can we implement the function from demuxer layer,  and then passthrough
> by SEI side data? By this way,  we can convert  such stream in streaming
> to  embedded CC to video stream easily also.

I did consider this approach, and it does raise the more fundamental
issue about trying to minimize the number of ways we have to process
CC data depending on whether it originated in SEI metadata or in
separate packets.  There are a number of problems with what you are
proposing though:

1.  There could be multiple CC streams within an MOV file but only a
single CC stream can be embedded into AVFrame side data.  Hence you
would have to specify some sort of argument to the demux to decide
which stream to embed.  This makes it much more difficult to do things
like ingest a stream with multiple CC streams and have separate
outputs with different CC streams.  Performing the work on the output
side allows you to use the standard "-map" mechanism to dictate which
CC streams are routed to which outputs, and to deliver the content to
different outputs with different CC streams.

2.  I have use cases in mind where the captions originate from sources
other than MOV files, where the video framerate is not known (or there
is no video at all in the source).  For example, I want to be able to
consume video from a TS source while simultaneously demuxing an SCC or
MCC file and sending the result in the output.  In such cases the
correct rate control for the captions can only be implemented on the
output side, since in such cases the SCC/MCC demux doesn't have access
to the corresponding video stream (it won't know the video framerate,
nor is it able to embed the captions into the AVFrame side data).

I can indeed imagine there are use cases where doing it further up the
pipeline could be useful.  For example, if you were taking in an MOV
file and wanting to produce a TS where the captions need to be
embedded as SEI metadata (hence you would need the e608 packets
converted to AVFrame side data prior to reaching the encoder).
However I don't see this as a substitute for being able to do it on
the output side when that is the most flexible approach for those
other use cases described above.

Much of this comes down to the fundamental limitations of the ffmpeg
framework related to being able to move data back/forth between data
packets and side data.  You can't feed data packets into
AVFilterGraphs.  You can't easily combine data from data packets into
AVFrames carrying video (or extract side data from AVFrames to
generate data packets), etc.  You can't use BSF filters to combine
data from multiple inputs such as compressed video streams and data
streams after encoding.  I've run across all these limitations over
the years, and at this point I'm trying to take the least invasive
approach possible that doesn't require changes to the fundamental
frameworks for handling data packets.

It's worth noting that nothing you have suggested is an "either/or"
situation.  Because caption processing is inexpensive, there isn't any
significant overhead in having multiple AvCCFifo instances in the
pipeline.  In other words, if you added a feature to the MOV demuxer,
it wouldn't prevent us from running the packets through an AvCCFifo
instance on the output side.  The patch proposed doesn't preclude you
adding such a feature on the demux side in the future.

Devin

-- 
Devin Heitmueller, Senior Software Engineer
LTN Global Communications
o: +1 (301) 363-1001
w: https://ltnglobal.com  e: devin.heitmueller at ltnglobal.com