[FFmpeg-devel] [PATCH] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

Thu Feb 15 17:02:27 EET 2024

On 05/02/2024 01:19, Tomas Härdin wrote:
> [...]
> Which entry in the table would the provided file correspond to? To me
> it seems none of them fit. There's two fields, meaning two j2k
> codestreams, in each corresponding essence element KLV packet (I think,
> unless CP packets get reassembled somewhere else). Entry I2 seems
> closest but it specifies FULL_FRAME. I1 is otherwise tempting, but
> there SampleRate should equal the field rate whereas the file has
> SampleRate = 30000/1001.

Other examples I have (not shareable) with 2 jp2k pictures per KLV have 
identification from an old version of AmberFin iCR, I have no file with 
the I2 correctly signaled, with my first example it isI2 (2 fields per 
KLV) with I1 Header Metadata Property Values **but** with I2 essence 
container label which has a content byte (byte 15 of the UL) of 0x04 = I2.
The AmberFin iCR files have the generic essence container label with 
content byte of 0x01 = FU (Unspecified) so for my main use case we could 
activate the search of the 2nd jp2k only if I2 is explicitly signaled by 
the essence container label but it would prevent to catch the 2nd field 
when this signaling is unspecified and buggy Frame layout + sample rate 
+ edit rate.

I agree that this is not what is defined in ST 422 in full, but note 
that frame layout and height are not required by ST 377 (only "best 
effort") so IMO we should not rely much on them, and at least we should 
handle what is in the wild, correct me if I am wrong but handling non 
conforming content seems an acceptable policy in FFmpeg (I think to e.g. 
DPX and non conforming EOLs of some scanners, their names are directly 
written in FFmpeg source code).
Video Line Map is also best effort but without it it is not possible to 
know the field_order, I wonder what should be done in that case. 
Currently I rely on current implementation in FFmpeg for catching 
field_order and don't try to find the 2nd field if field_order is 
AV_FIELD_UNKNOWN (not important for me as all files I have have 
field_order related metadata).

Also if I manually edit the MXF for having a conforming I2 property 
values, FFmpeg behaves same (still indicating half height and still 
silently discarding the 2nd field), so in my opinion the handling of 2 
jp2k pictures per KLV is still relevant for handling correctly I2 
conforming files and tolerating wrong property values may be relevant 
(checking essence container label only? to be discussed).

On 03/02/2024 20:58, Tomas Härdin wrote:
> The fastest way, in a player, is probably to do it with a shader. That
> should be the least amount of copies and the most cache coherent.

As far a I know the player is not aware that the AVFrame actually 
contains a field so it can not apply a shader or something else, which 
AVFrame field indicates that this is a a field to be interleaved with 
the next AVFrame before display?
Currently for I1 files ffplay or VLC show double rate half height so it 
seems that they don't catch that AVFrame contains a field.

On 03/02/2024 21:04, Tomas Härdin wrote:
> It should also be said that what this patch effectively does is
> silently convert SEPARATE_FIELDS to MIXED_FIELDS. What if I want to
> transcode J2K to lower bitrate but keep it SEPARATE_FIELDS?

I don't get what is the expected behavior of FFmpeg: what is the meaning 
of "243" in
"Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
(swapped)), 720x243, lossless, SAR 9:20 DAR 4:3, 29.97 fps, 29.97 tbr, 
29.97 tbn"

My understanding is that a frame height is expected, never a field 
height, and there is no indication in the current output that 243 is a 
field height for both I1 & I2, so the "silent conversion" would be 
expected by the user in order to have a frame in the output and never a 
field, am I wrong?

Also it seems that there is no way to signal that the outputted picture 
is a field and not a frame, and FFmpeg handles I1 (1 field per KLV) as 
if it ignores that this is a field and not a frame, so when a I1 file is 
converted to another format without an interleave filter manually added 
the output is an ugly flipping double rate half height content.
Silently converting a field to a frame seems to me a worse behavior than 
silently converting SEPARATE_FIELDS to MIXED_FIELDS because the output 
is not what is expected by the person who created the file as well as 
the person watching the output of FFmpeg.

> What if I want to transcode J2K to lower bitrate but keep it SEPARATE_FIELDS?

Interlacing the lines then encoding separately the fields? It is more a 
matter of default behavior (deinterlace or not) and who would need to 
apply a filter, my issue is that I see no way to signal in FFmpeg that 
"got_frame" means "got frame or field" and that AVFrame contains a field 
so I would prefer that the default behavior is to handle frames in 
AVFrame rather than fields. Is it acceptable?Additionally the MXF 
container indicates (for conforming files) that I2 edit rate is for a 
frame even if there are 2 separate fields in the KLV, do you expect that 
FFmpeg forces separate fields in separate AVFrame like I1 even when the 
MXF muxer explicitly said that it is expected that edit is by frame? We 
have here 3 cases (separated fields in separated KLV, 2 separated fields 
in 1 KLV, mixed fields in 1 KLV), my understanding of current AVFrame is 
that only mixed fields (so 1 frame) in 1 AVFrame is supported, and both 
other MXF methods are silently converted to it (fields handled as frame 
for separated fields in separated KLV, 2nd field discarded for 2 
separated fields in 1 KLV). My planned next step is to handle I1 files 
automatically in the right output (a frame rather than fields in AVFrame 
and output) so users don't have the surprise to have a double rate half 
height content in MKV or MP4 without any signaling about that and badly 
handled by any player, including ffplay. Wouldn't it be acceptable? 
Should I split the 2 I2 fields and provide 2 AVFrame?

In other words, I would like to know if AVFrame is intended at long term 
to handle also fields in addition to frames, and if so is there a way to 
signal that the AVFrame structure actually contains a field so 
players/transcoders (including FFmpeg) can do the interleave before 
showing (ffplay) something or converting to a format not supporting 
separated fields, if not what is the prefered approach (my current 
proposal, decode both fields then interleave, adding an interleave 
filter, splitting a KLV into 2 AVPacket, other?).
Jerome