[FFmpeg-devel] [PATCH 6/6 v2] avformat/movenc: add support for Immersive Audio Model and Formats in ISOBMFF

Mon Feb 5 18:18:01 EET 2024

James Almer:
> On 2/5/2024 12:28 PM, Andreas Rheinhardt wrote:
>> James Almer:
>>> On 2/5/2024 12:12 PM, Andreas Rheinhardt wrote:
>>>> James Almer:
>>>>> On 2/3/2024 11:50 AM, Andreas Rheinhardt wrote:
>>>>>>> diff --git a/libavformat/movenc.h b/libavformat/movenc.h
>>>>>>> index 60363198c9..fee3e759e0 100644
>>>>>>> --- a/libavformat/movenc.h
>>>>>>> +++ b/libavformat/movenc.h
>>>>>>> @@ -25,7 +25,9 @@
>>>>>>>     #define AVFORMAT_MOVENC_H
>>>>>>>       #include "avformat.h"
>>>>>>> +#include "iamf.h"
>>>>>>>     #include "movenccenc.h"
>>>>>>> +#include "libavcodec/bsf.h"
>>>>>>
>>>>>> There is no need to include these here, as you don't need complete
>>>>>> types. This has the added benefit of forcing you to actually include
>>>>>> the
>>>>>> files where you are using them (namely in movenc.c, where you
>>>>>> forgot to
>>>>>> include bsf.h).
>>>>>
>>>>> Ok, fixed locally.
>>>>>
>>>>> Will push the set soon.
>>>>
>>>> It seems you have not noticed my objection to the first version of
>>>> your set.
>>>>
>>>> - Andreas
>>>
>>> Can you link to it?
>>
>> Sorry, it was v2:
>> https://ffmpeg.org/pipermail/ffmpeg-devel/2024-February/320722.html
>>
>> - Andreas
> 
> I removed the codec list from the split bsf like you asked, and
> explained what the bsfs do in the documentation.

For those few codecs where different framings are common and supported
by us, the muxers convert the given framing to the needs of the output
format; decoders also support the various framings. This of course only
works if they can decide which packetization the input uses; it is
possible for the cases we support.

If you allow that packets can contain OBU encapsulated data for
arbitrary codec ids (even if only intended for a few of them), then this
packetization would become officially allowed and we would have to adapt
our muxers and decoders accordingly. Which is just not possible, because
there is just no information available to base this decision on.

There is a second complication with iamf_frame_split_bsf: Up until now,
BSFs only passed the stream index value through. But with this BSF the
output may have multiple ids even when the input only had one. I am
pretty sure that this will surprise and break many users. I don't know
whether ffmpeg is among them (if a user inserts the BSF manually).

In fact, for this BSF the stream_index that the output packet gets is
determined by the offset as well as the packet data alone. The only way
for the demuxer to know these numbers is if it has already parsed the
packet data before and added streams according to this. Looking at 3/6
it seems that this is indeed what's happening (which is wasteful). But
only partially: The iamf_descriptors data is checked and streams are
created according to it, but the data read via av_append_packet() is not
checked at all. What guarantees that it can't contain
IAMF_OBU_IA_AUDIO_ELEMENT elements (which trigger adding new ids and
therefore lead to an increment in the potential output stream_index)?
Also notice that your 3/6 uses the pkt's stream_index coming out of the
BSF without any check.

- Andreas