[FFmpeg-devel] [PATCH v2] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

Tue Feb 20 17:07:43 EET 2024

Attached is an updated version of the patch proposal.

About the idea to keep separate fields in the output AVFrame, I note 
from the discussion that it is commonly accepted that up to now it is 
expected that the AVPacket contains what is in the MXF element and that 
the AVFrame contains a frame and never a field, and additionally I see 
in e.g. mpeg12dec.c that the decoder interleaves separate fields:
mb_y <<= field_pic;
if (s2->picture_structure == PICT_BOTTOM_FIELD)
     mb_y++;
And mpegvideo_parser.c creates a AVPacket with both fields in AVPacket 
even if they are separated, this patch keeps the AVPacket from e.g. MXF 
with both fields in it and does something similar to what do other 
decoders with separate fields in the output AVFRame.

About the detection of the 2 separated fields in 1 packet in the MXF 
file (I2 mode), it is doable in the example file provided in the first 
patch proposal to catch it by checking the essence label but other files 
I have don't have the specific essence label (they use the "undefined" 
essence label, legit) so IMO we should not rely on it and there is 
actually no practical advantage in catching that from the container.

In practice this new patch proposal is slightly more complex, with one 
recursive call to jpeg2000_decode_frame() when there are 2 codestreams 
in 1 AVPacket, but it has a negligible performance impact (few 
comparisons and bool checks) when there is only one codestream in the 
AVPacket (the currently only supported method, remind that currently 
FFmpeg completely discards the 2nd codestream present in the AVPacket) 
and it relies on the current jpeg2000_read_main_headers() function for 
catching the end of the first codestream (safer than the quick find of 
EOC/SOC in the first version).

It also changes the jpeg2000_decode_frame return value to the count of 
bytes parsed, it seems that it was what is expected but in practice it 
was not doing that, fixing the return value could be a separate patch if 
preferred.

Jérôme

On 02/02/2024 16:55, Jerome Martinez wrote:
> Before this patch, the FFmpeg MXF parser correctly detects content 
> with 2 fields in 1 AVPacket as e.g. interlaced 720x486 but the FFmpeg 
> JPEG 2000 decoder reads the JPEG 2000 SIZ header without understanding 
> that the indicated height is the height of 1 field only so overwrites 
> the frame size info with e.g. 720x243, and also completely discards 
> the second frame, which lead to the decoding of only half of the 
> stored content as "progressive" 720x243 flagged interlaced.
>
> Example file:
> https://www.digitizationguidelines.gov/guidelines/MXF_sampleFiles/RDD48-sample12-gf-jpeg2000-ntsc-4.2.zip 
>
>
> Before this patch:
> Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
> (swapped)), 720x243, lossless, SAR 9:20 DAR 4:3, 29.97 tbr, 29.97 tbn, 
> 29.97 tbc
>
> After this patch:
> Stream #0:0: Video: jpeg2000, yuv422p10le(bottom coded first 
> (swapped)), 720x486, lossless, SAR 9:10 DAR 4:3, 29.97 fps, 29.97 tbr, 
> 29.97 tbn
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".

-------------- next part --------------
From 6d47d60178f50091afc3b7193b752186603efc6a Mon Sep 17 00:00:00 2001
From: Jerome Martinez <jerome at mediaarea.net>
Date: Tue, 20 Feb 2024 16:04:11 +0100
Subject: [PATCH] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

---
 libavcodec/jpeg2000dec.c | 78 ++++++++++++++++++++++++++++++++++++++++++------
 libavcodec/jpeg2000dec.h |  5 ++++
 2 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c
index 691cfbd891..28a3e11020 100644
--- a/libavcodec/jpeg2000dec.c
+++ b/libavcodec/jpeg2000dec.c
@@ -194,6 +194,8 @@ static int get_siz(Jpeg2000DecoderContext *s)
     int ret;
     int o_dimx, o_dimy; //original image dimensions.
     int dimx, dimy;
+    int previous_width = s->width;
+    int previous_height = s->height;
 
     if (bytestream2_get_bytes_left(&s->g) < 36) {
         av_log(s->avctx, AV_LOG_ERROR, "Insufficient space for SIZ\n");
@@ -211,7 +213,7 @@ static int get_siz(Jpeg2000DecoderContext *s)
     s->tile_offset_y  = bytestream2_get_be32u(&s->g); // YT0Siz
     ncomponents       = bytestream2_get_be16u(&s->g); // CSiz
 
-    if (av_image_check_size2(s->width, s->height, s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
+    if (av_image_check_size2(s->width, s->height << (s->has_2_fields && s->height >= 0), s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
         avpriv_request_sample(s->avctx, "Large Dimensions");
         return AVERROR_PATCHWELCOME;
     }
@@ -301,6 +303,20 @@ static int get_siz(Jpeg2000DecoderContext *s)
             return AVERROR(ENOMEM);
     }
 
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields) {
+        s->height <<= 1;
+        s->image_offset_y <<= 1;
+        s->tile_offset_y <<= 1;
+        if (s->is_second_field && (s->width != previous_width || s->height != previous_height)) {
+            avpriv_request_sample(s->avctx, "Support of 2 JPEG 2000 codestreams with different base characteristics");
+            return AVERROR_PATCHWELCOME;
+        }
+        if (s->image_offset_y || s->tile_offset_y || (s->tile_height << 1) != s->height) {
+            av_log(s->avctx, AV_LOG_WARNING, "Decoding of 2 fields having titles in 1 AVPacket was not tested\n");
+        }
+    }
+
     /* compute image size with reduction factor */
     o_dimx = ff_jpeg2000_ceildivpow2(s->width  - s->image_offset_x,
                                                s->reduction_factor);
@@ -2001,7 +2017,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                                                                                                   \
             y    = tile->comp[compno].coord[1][0] -                                               \
                    ff_jpeg2000_ceildiv(s->image_offset_y, s->cdy[compno]);                        \
-            line = (PIXEL *)picture->data[plane] + y * (picture->linesize[plane] / sizeof(PIXEL));\
+            line = (PIXEL *)picture->data[plane] + (y + (s->is_second_field ^ s->is_bottom_coded_first)) * (picture->linesize[plane] / sizeof(PIXEL));\
             for (; y < h; y++) {                                                                  \
                 PIXEL *dst;                                                                       \
                                                                                                   \
@@ -2028,7 +2044,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                         dst += pixelsize;                                                         \
                     }                                                                             \
                 }                                                                                 \
-                line += picture->linesize[plane] / sizeof(PIXEL);                                 \
+                line += (picture->linesize[plane] << s->has_2_fields) / sizeof(PIXEL);            \
             }                                                                                     \
         }                                                                                         \
                                                                                                   \
@@ -2450,6 +2466,7 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 {
     Jpeg2000DecoderContext *s = avctx->priv_data;
     int ret;
+    int codestream_size;
 
     s->avctx     = avctx;
     bytestream2_init(&s->g, avpkt->data, avpkt->size);
@@ -2484,20 +2501,50 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
         ret = AVERROR_INVALIDDATA;
         goto end;
     }
+
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields && !s->is_second_field) {
+        switch (avctx->field_order) {
+        case AV_FIELD_BB:
+        case AV_FIELD_BT:
+            s->is_bottom_coded_first = 1;
+            break;
+        default:
+            s->is_bottom_coded_first = 0;
+        }
+    }
+
     if (ret = jpeg2000_read_main_headers(s))
         goto end;
+    codestream_size = avpkt->size - bytestream2_get_bytes_left(&s->g);
+
+    /* management of frames having 2 separate codestreams */
+    if (bytestream2_get_bytes_left(&s->g) > 1 && bytestream2_peek_be16(&s->g) == JPEG2000_SOC) {
+        if (!s->has_2_fields) {
+            /* 2 codestreams newly detected, adatping output frame structure for handling 2 codestreams and parsing again the headers (fast and done once for a stable stream) */
+            s->has_2_fields = 1;
+            jpeg2000_dec_cleanup(s);
+            return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+        }
+    } else if (s->has_2_fields && !s->is_second_field) {
+        /* 1 codestream newly detected, adatping output frame structure for handling 1 codestream and parsing again the headers (fast and never done for a stable stream) */
+        s->has_2_fields = 0;
+        s->is_bottom_coded_first = 0;
+        jpeg2000_dec_cleanup(s);
+        return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+    }
 
     if (s->sar.num && s->sar.den)
         avctx->sample_aspect_ratio = s->sar;
     s->sar.num = s->sar.den = 0;
 
     if (avctx->skip_frame >= AVDISCARD_ALL) {
-        jpeg2000_dec_cleanup(s);
-        return avpkt->size;
+        ret = codestream_size;
+        goto end;
     }
 
     /* get picture buffer */
-    if ((ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
+    if ((!s->has_2_fields || !s->is_second_field) && (ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
         goto end;
     picture->pict_type = AV_PICTURE_TYPE_I;
     picture->flags |= AV_FRAME_FLAG_KEY;
@@ -2518,17 +2565,30 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 
     avctx->execute2(avctx, jpeg2000_decode_tile, picture, NULL, s->numXtiles * s->numYtiles);
 
-    jpeg2000_dec_cleanup(s);
-
     *got_frame = 1;
 
     if (s->avctx->pix_fmt == AV_PIX_FMT_PAL8)
         memcpy(picture->data[1], s->palette, 256 * sizeof(uint32_t));
 
-    return bytestream2_tell(&s->g);
+    ret = codestream_size;
 
 end:
     jpeg2000_dec_cleanup(s);
+
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields && !s->is_second_field && ret < avpkt->size && ret >= 0) {
+        /* only the 1st codestream was parsed, parsing now the 2nd codestream */
+        s->is_second_field = 1;
+        avpkt->data += ret;
+        avpkt->size -= ret;
+        ret = jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+        avpkt->data -= ret;
+        avpkt->size += ret;
+        s->is_second_field = 0;
+        if (ret >= 0)
+            ret += codestream_size;
+    }
+
     return ret;
 }
 
diff --git a/libavcodec/jpeg2000dec.h b/libavcodec/jpeg2000dec.h
index d0ca6e7a79..ce42812c48 100644
--- a/libavcodec/jpeg2000dec.h
+++ b/libavcodec/jpeg2000dec.h
@@ -114,6 +114,11 @@ typedef struct Jpeg2000DecoderContext {
 
     /*options parameters*/
     int             reduction_factor;
+    
+    /* field info */
+    int8_t          has_2_fields;
+    int8_t          is_bottom_coded_first;
+    int8_t          is_second_field;
 } Jpeg2000DecoderContext;
 
 #endif //AVCODEC_JPEG2000DEC_H
-- 
2.13.3.windows.1