[FFmpeg-devel] [PATCH v22 02/10] avcodec/evc_parser: Added parser implementation for EVC format

Fri May 19 18:23:54 EEST 2023

On 5/19/2023 7:31 AM, Dawid Kozinski/Multimedia (PLT) /SRPOL/Staff 
Engineer/Samsung Electronics wrote:
> 
> 
> 
>> -----Original Message-----
>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of James
>> Almer
>> Sent: środa, 10 maja 2023 22:21
>> To: ffmpeg-devel at ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH v22 02/10] avcodec/evc_parser: Added
>> parser implementation for EVC format
>>
>> On 4/27/2023 9:02 AM, Dawid Kozinski wrote:
>>> - Added constants definitions for EVC parser
>>> - Provided NAL units parsing following ISO_IEC_23094-1
>>> - EVC parser registration
>>>
>>> Signed-off-by: Dawid Kozinski <d.kozinski at samsung.com>
>>> ---
>>>    libavcodec/Makefile     |    1 +
>>>    libavcodec/evc.h        |  155 ++++
>>>    libavcodec/evc_parser.c | 1497
>> +++++++++++++++++++++++++++++++++++++++
>>>    libavcodec/parsers.c    |    1 +
>>>    4 files changed, 1654 insertions(+)
>>>    create mode 100644 libavcodec/evc.h
>>>    create mode 100644 libavcodec/evc_parser.c
>>
>> There seems to have been a regression in this version compared to v20.
>> Try to compile with the libxevd decoder disabled and open a raw file (not
> mp4).
>>
>> Whereas with v20 i was getting
>>
>>> Input #0, evc, from 'akiyo_cif.evc':
>>>    Duration: N/A, bitrate: N/A
>>>    Stream #0:0: Video: evc (Baseline), none, 352x288, 25 fps, 25 tbr,
>>> 1200k tbn
>>
>> I'm now getting
>>
>>> Input #0, evc, from 'akiyo_cif.evc':
>>>    Duration: N/A, bitrate: N/A
>>>    Stream #0:0: Video: evc (Baseline), none, 555902582x0, 25 fps, 25
>>> tbr, 1200k tbn
>>
>> It seems that the problem showed up because you moved the parameter sets
> to
>> stack to skip allocating them, and you no longer check if they exist or
> were
>> parsed by looking at slice_pic_parameter_set_id and such.
>>
>> That aside, i looked at the EVC spec and noticed that the "raw" format in
> Annex-
>> B is unlike that of H.26{4,5,6}: There's no start code, and instead
> there's a NAL
>> size prefix, which sounds like the isobmff way of encapsulating NALUs.
>> I also noticed that the syntax for nal_unit() contains an
>> emulation_prevention_three_byte element, but there's no explanation for it
> in
>> the semantics section. Its existence in H.26* is to prevent parsers from
>> misinterpreting a sequence of two or more zeroes as a potential start
> code, but
>> that's clearly not the case here, so why is it defined and present at all?
>>
>> What this means is that the parser can't be made to assemble an AU. If you
> feed
>> it data starting from the middle of a NAL, it will not be able to sync to
> the start
>> of the next NAL because all it looks for is a four byte size value and
> will accept
>> the very first four bytes its fed as one.
>> Since i doubt the spec can be changed at this point to amend this
> oversight, the
>> AU assembling will have to occur in the evc demuxer, much like we do with
> AV1
>> (both raw obu and Annex-B formats as defined in the corresponding spec),
> and
>> the parser be limited to parse fully assembled NALs with
> parse_nal_units().
> 
> According to your last EVC review:
> We have re-implemented the EVC demuxer to assemble Access Units (AUs) and
> pass them further, while the EVC parser is limited to parsing complete NAL
> units provided in consecutive AUs.
> 
> However, before we create a new patchset, we would like to discuss some
> things because we believe this solution is not optimal.
> 
> According to the EVC documentation, "Each access unit contains a set of VCL
> NAL units that together compose a primary coded picture."
> This means that to find the boundaries of an AU, we need to identify all the
> NAL units that contain data for a single encoded picture.
> In our case, to determine whether the next VCL NAL unit contains data for
> the same picture as the previous VCL NAL unit or for a different encoded
> picture, we need information contained in the NAL units. We need to extract
> certain data from NAL units like SPS and PPS, as well as from the Slice
> Headers of VCL NAL units.
> In other words, this implies that parsing NAL units is required for this
> purpose.
> 
> It may not be as extensive parsing as in the EVC parser, but still, there is
> a significant amount of parsing involved, which adds some overhead.
> Another question is whether the demuxer is the appropriate place for parsing
> NAL units? I guess it's not.
> 
> Perhaps it would be better if the demuxer prepared complete NAL units
> instead of complete AUs. The EVC parser would then receive complete NAL
> units, and since it already parses them, we have the necessary information
> for preparing complete AUs at no extra cost.
> 
> Preparing complete NAL units is definitely simpler than preparing complete
> AUs. It only requires finding the length of the NAL unit and putting that
> amount of data from the input into the AVPacket.
> 
> Please let me know your thoughts on this.

Yes, i mostly agree with what you said, however the demuxer needs to 
propagate entire AUs, because otherwise in a codec copy scenario, a 
muxer would see packets containing a single NALu, potentially creating 
non compliant files.

The proper solution for this is to have the demuxer prepare complete 
NALUs, and pass them to a bitstream filter that will assemble an AU, 
with all this happening within the demuxer. The assembled AU is then 
output by the demuxer, and the parser and decoder will only see and 
handle said assembled AU.
The bitstream filter will reside in libavcodec, which means the 
sps/pps/slice parsing code can be shared between the parser and this 
bsf, so no code duplication in lavc and lavf. This is what we do with 
AV1 for its raw demuxer (See libavformat/av1dec.c and 
libavcodec/av1_frame_merge.c, the latter which uses the CBS framework, 
but in this case you can use the standalone parsing as you already wrote 
it).

I also want to apologize for not looking at the spec earlier, to notice 
the raw bitstream was not avparser friendly. It would have saved you a 
lot of work.