[FFmpeg-devel] [PATCH v3] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

Sun Feb 25 06:14:25 EET 2024

On 24/02/2024 13:26, Tomas Härdin wrote:
> [...]
>>> It should be possible to have ffmpeg set up the necessary plumbing
>>> for
>>> this.
>> But is it how it works elsewhere in FFmpeg? Would such complex and
>> deep
>> modifications be accepted by others?
> Good question. I would propose something like the following:
>
> 1) detect the use of SEPARATE_FIELDS and set a flag in AVStream


As in practice and in that case (2 jp2k codestreams per AVPacket) it is 
only a tip (because we can autodetect and there are many buggy files in 
the wild) for the jpeg2000 decoder, I was planning to add that later in 
a separate patch, but attached is a version with the flag.

> 2) allocate AVFrame for the size of the resulting *frame*
So keeping what is already there.

> 3a) if the codec is inherently interlaced, call the decoder once
> 3b) if the codec is not inherently interlaced, call the decoder twice,
> with appropriate stride, and keep track of the number of bytes decoded
> so far so we know what offset to start the second decode from


The place I see for that is in decode_simple_internal().
But it is a very hot place I don't like to modify, and it seems to me 
some extra code for 99.9999% (or even more 9s) of files which don't need 
such feature, with more risk to forget this specific feature during a 
future dev e.g. not obvious to change also in ff_thread_decode_frame 
when touching this part.
I also needed to add a dedicated AVStream field for saying that the 
decoder is able to manage this functionality (and is needed there).

What is the added value to call the decoder twice from decode.c rather 
than recursive call (or a master function in the decoder calling the 
current function twice, if preferred) inside the decoder only?
As far as I understand, it would not help for other formats (only the 
signaling propagation in AVStream helps and it is done by another 
AVStream field) and I personally highly prefer that such feature is as 
much as possible in a single place in each decoder rather than pieces a 
bit everywhere, and each decoder needs to be upgraded anyway.

> The codecs for which 3b) applies include at least:
>
> * jpeg2000

Our use case.

> * ffv1

FFV1 has its own flags internally for interlaced content (interleaved 
method only) and I expect no work for separated fields. the MXF/FFV1 
spec does not plan separated fields for FFV1, and there is no byte in 
the essence label for that.


> * rawvideo
> * tiff

I didn't find specifications for the essence label UL corresponding and 
I have no file for that, as far as I understand it is highly theoretical 
but if it appears would be only a matter of mapping the MXF signaling to 
the new AVStream field and supporting the feature in the decoders (even 
if we implement the idea of calling the decoder twice, the decoder needs 
to be expanded for this feature).
So IMO no dev to do there too for the moment.

Jérôme

-------------- next part --------------
From f4311b718012a92590ce6168355ec118e02052a8 Mon Sep 17 00:00:00 2001
From: Jerome Martinez <jerome at mediaarea.net>
Date: Tue, 20 Feb 2024 16:04:11 +0100
Subject: [PATCH] avcodec/jpeg2000dec: support of 2 fields in 1 AVPacket

---
 libavcodec/avcodec.h       | 14 +++++++++
 libavcodec/codec_par.c     |  3 ++
 libavcodec/codec_par.h     |  5 ++++
 libavcodec/decode.c        |  3 ++
 libavcodec/defs.h          |  7 +++++
 libavcodec/jpeg2000dec.c   | 73 +++++++++++++++++++++++++++++++++++++++-------
 libavcodec/jpeg2000dec.h   |  6 ++++
 libavcodec/pthread_frame.c |  3 ++
 libavformat/mxfdec.c       | 14 +++++++++
 9 files changed, 118 insertions(+), 10 deletions(-)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 7fb44e28f4..38d63adc0f 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -2116,6 +2116,20 @@ typedef struct AVCodecContext {
      *   an error.
      */
     int64_t frame_num;
+
+    /**
+     * Video only. The way separate fields are wrapped in the container
+     * - decoding: tip from the demuxer
+     * - encoding: not (yet) used
+     */
+    enum AVFrameWrapping               frame_wrapping;
+
+    /**
+     * Video only. Indicate if running the decoder twice for a single AVFrame is supported
+     * - decoding: set by the decoder
+     * - encoding: not  used
+     */
+    int                                frame_wrapping_field_2_supported;
 } AVCodecContext;
 
 /**
diff --git a/libavcodec/codec_par.c b/libavcodec/codec_par.c
index abaac63841..3f26f9d4d6 100644
--- a/libavcodec/codec_par.c
+++ b/libavcodec/codec_par.c
@@ -51,6 +51,7 @@ static void codec_parameters_reset(AVCodecParameters *par)
     par->framerate           = (AVRational){ 0, 1 };
     par->profile             = AV_PROFILE_UNKNOWN;
     par->level               = AV_LEVEL_UNKNOWN;
+    par->frame_wrapping      = AV_WRAPPING_UNKNOWN;
 }
 
 AVCodecParameters *avcodec_parameters_alloc(void)
@@ -165,6 +166,7 @@ int avcodec_parameters_from_context(AVCodecParameters *par,
         par->sample_aspect_ratio = codec->sample_aspect_ratio;
         par->video_delay         = codec->has_b_frames;
         par->framerate           = codec->framerate;
+        par->frame_wrapping      = codec->frame_wrapping;
         break;
     case AVMEDIA_TYPE_AUDIO:
         par->format           = codec->sample_fmt;
@@ -252,6 +254,7 @@ int avcodec_parameters_to_context(AVCodecContext *codec,
         codec->sample_aspect_ratio    = par->sample_aspect_ratio;
         codec->has_b_frames           = par->video_delay;
         codec->framerate              = par->framerate;
+        codec->frame_wrapping         = par->frame_wrapping;
         break;
     case AVMEDIA_TYPE_AUDIO:
         codec->sample_fmt       = par->format;
diff --git a/libavcodec/codec_par.h b/libavcodec/codec_par.h
index f42dd3b1d5..1e53292553 100644
--- a/libavcodec/codec_par.h
+++ b/libavcodec/codec_par.h
@@ -136,6 +136,11 @@ typedef struct AVCodecParameters {
     enum AVFieldOrder                  field_order;
 
     /**
+     * Video only. The way separate fields are wrapped in the container
+     */
+    enum AVFrameWrapping               frame_wrapping;
+
+    /**
      * Video only. Additional colorspace characteristics.
      */
     enum AVColorRange                  color_range;
diff --git a/libavcodec/decode.c b/libavcodec/decode.c
index 2cfb3fcf97..979759d84a 100644
--- a/libavcodec/decode.c
+++ b/libavcodec/decode.c
@@ -428,6 +428,9 @@ static inline int decode_simple_internal(AVCodecContext *avctx, AVFrame *frame,
         consumed = ff_thread_decode_frame(avctx, frame, &got_frame, pkt);
     } else {
         consumed = codec->cb.decode(avctx, frame, &got_frame, pkt);
+        if (consumed >= 0 && avctx->frame_wrapping_field_2_supported) {
+            consumed = codec->cb.decode(avctx, frame, &got_frame, pkt);
+        }
 
         if (!(codec->caps_internal & FF_CODEC_CAP_SETS_PKT_DTS))
             frame->pkt_dts = pkt->dts;
diff --git a/libavcodec/defs.h b/libavcodec/defs.h
index 00d840ec19..8f7ecf81c5 100644
--- a/libavcodec/defs.h
+++ b/libavcodec/defs.h
@@ -204,6 +204,13 @@ enum AVFieldOrder {
     AV_FIELD_BT,          ///< Bottom coded first, top displayed first
 };
 
+enum AVFrameWrapping {
+    AV_WRAPPING_UNKNOWN,
+    AV_WRAPPING_FRAME,    ///< if interlaced content: lines are interleaved
+    AV_WRAPPING_FIELD_1,  ///< each field is an independent encoded item, 1 field per AVPacket
+    AV_WRAPPING_FIELD_2,  ///< each field is an independent encoded item, 2 fields per AVPacket
+};
+
 /**
  * @ingroup lavc_decoding
  */
diff --git a/libavcodec/jpeg2000dec.c b/libavcodec/jpeg2000dec.c
index 691cfbd891..c9b935d97b 100644
--- a/libavcodec/jpeg2000dec.c
+++ b/libavcodec/jpeg2000dec.c
@@ -194,6 +194,8 @@ static int get_siz(Jpeg2000DecoderContext *s)
     int ret;
     int o_dimx, o_dimy; //original image dimensions.
     int dimx, dimy;
+    int previous_width = s->width;
+    int previous_height = s->height;
 
     if (bytestream2_get_bytes_left(&s->g) < 36) {
         av_log(s->avctx, AV_LOG_ERROR, "Insufficient space for SIZ\n");
@@ -211,7 +213,7 @@ static int get_siz(Jpeg2000DecoderContext *s)
     s->tile_offset_y  = bytestream2_get_be32u(&s->g); // YT0Siz
     ncomponents       = bytestream2_get_be16u(&s->g); // CSiz
 
-    if (av_image_check_size2(s->width, s->height, s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
+    if (av_image_check_size2(s->width, s->height << (s->has_2_fields && s->height >= 0), s->avctx->max_pixels, AV_PIX_FMT_NONE, 0, s->avctx)) {
         avpriv_request_sample(s->avctx, "Large Dimensions");
         return AVERROR_PATCHWELCOME;
     }
@@ -301,6 +303,20 @@ static int get_siz(Jpeg2000DecoderContext *s)
             return AVERROR(ENOMEM);
     }
 
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields) {
+        s->height <<= 1;
+        s->image_offset_y <<= 1;
+        s->tile_offset_y <<= 1;
+        if (s->is_second_field && (s->width != previous_width || s->height != previous_height)) {
+            avpriv_request_sample(s->avctx, "Support of 2 JPEG 2000 codestreams with different base characteristics");
+            return AVERROR_PATCHWELCOME;
+        }
+        if (s->image_offset_y || s->tile_offset_y || (s->tile_height << 1) != s->height) {
+            av_log(s->avctx, AV_LOG_WARNING, "Decoding of 2 fields having titles in 1 AVPacket was not tested\n");
+        }
+    }
+
     /* compute image size with reduction factor */
     o_dimx = ff_jpeg2000_ceildivpow2(s->width  - s->image_offset_x,
                                                s->reduction_factor);
@@ -2001,7 +2017,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                                                                                                   \
             y    = tile->comp[compno].coord[1][0] -                                               \
                    ff_jpeg2000_ceildiv(s->image_offset_y, s->cdy[compno]);                        \
-            line = (PIXEL *)picture->data[plane] + y * (picture->linesize[plane] / sizeof(PIXEL));\
+            line = (PIXEL *)picture->data[plane] + (y + (s->is_second_field ^ s->is_bottom_coded_first)) * (picture->linesize[plane] / sizeof(PIXEL));\
             for (; y < h; y++) {                                                                  \
                 PIXEL *dst;                                                                       \
                                                                                                   \
@@ -2028,7 +2044,7 @@ static inline void tile_codeblocks(const Jpeg2000DecoderContext *s, Jpeg2000Tile
                         dst += pixelsize;                                                         \
                     }                                                                             \
                 }                                                                                 \
-                line += picture->linesize[plane] / sizeof(PIXEL);                                 \
+                line += (picture->linesize[plane] << s->has_2_fields) / sizeof(PIXEL);            \
             }                                                                                     \
         }                                                                                         \
                                                                                                   \
@@ -2441,6 +2457,9 @@ static av_cold int jpeg2000_decode_init(AVCodecContext *avctx)
 
     ff_jpeg2000dsp_init(&s->dsp);
     ff_jpeg2000_init_tier1_luts();
+    
+    s->has_2_fields = avctx->frame_wrapping == AV_WRAPPING_FIELD_2;
+    avctx->frame_wrapping_field_2_supported = s->has_2_fields;
 
     return 0;
 }
@@ -2450,9 +2469,10 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 {
     Jpeg2000DecoderContext *s = avctx->priv_data;
     int ret;
+    int codestream_size;
 
     s->avctx     = avctx;
-    bytestream2_init(&s->g, avpkt->data, avpkt->size);
+    bytestream2_init(&s->g, avpkt->data + s->consumed, avpkt->size - s->consumed);
     s->curtileno = -1;
     memset(s->cdef, -1, sizeof(s->cdef));
 
@@ -2484,20 +2504,50 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
         ret = AVERROR_INVALIDDATA;
         goto end;
     }
+
+    /* management of frames having 2 separate codestreams */
+    if (s->has_2_fields && !s->is_second_field) {
+        switch (avctx->field_order) {
+        case AV_FIELD_BB:
+        case AV_FIELD_BT:
+            s->is_bottom_coded_first = 1;
+            break;
+        default:
+            s->is_bottom_coded_first = 0;
+        }
+    }
+
     if (ret = jpeg2000_read_main_headers(s))
         goto end;
+    codestream_size = avpkt->size - bytestream2_get_bytes_left(&s->g);
+
+    /* management of frames having 2 separate codestreams */
+    if (bytestream2_get_bytes_left(&s->g) > 1 && bytestream2_peek_be16(&s->g) == JPEG2000_SOC) {
+        if (!s->has_2_fields) {
+            /* 2 codestreams newly detected, adatping output frame structure for handling 2 codestreams and parsing again the headers (fast and never done if wrapper has the right tip) */
+            s->has_2_fields = 1;
+            jpeg2000_dec_cleanup(s);
+            return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+        }
+    } else if (s->has_2_fields && !s->is_second_field) {
+        /* 1 codestream newly detected, adatping output frame structure for handling 1 codestream and parsing again the headers (fast and never done if wrapper has the right tip) */
+        s->has_2_fields = 0;
+        s->is_bottom_coded_first = 0;
+        jpeg2000_dec_cleanup(s);
+        return jpeg2000_decode_frame(avctx, picture, got_frame, avpkt);
+    }
 
     if (s->sar.num && s->sar.den)
         avctx->sample_aspect_ratio = s->sar;
     s->sar.num = s->sar.den = 0;
 
     if (avctx->skip_frame >= AVDISCARD_ALL) {
-        jpeg2000_dec_cleanup(s);
-        return avpkt->size;
+        ret = codestream_size;
+        goto end;
     }
 
     /* get picture buffer */
-    if ((ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
+    if ((!s->has_2_fields || !s->is_second_field) && (ret = ff_thread_get_buffer(avctx, picture, 0)) < 0)
         goto end;
     picture->pict_type = AV_PICTURE_TYPE_I;
     picture->flags |= AV_FRAME_FLAG_KEY;
@@ -2518,17 +2568,20 @@ static int jpeg2000_decode_frame(AVCodecContext *avctx, AVFrame *picture,
 
     avctx->execute2(avctx, jpeg2000_decode_tile, picture, NULL, s->numXtiles * s->numYtiles);
 
-    jpeg2000_dec_cleanup(s);
-
     *got_frame = 1;
 
     if (s->avctx->pix_fmt == AV_PIX_FMT_PAL8)
         memcpy(picture->data[1], s->palette, 256 * sizeof(uint32_t));
 
-    return bytestream2_tell(&s->g);
+    ret = codestream_size;
 
 end:
     jpeg2000_dec_cleanup(s);
+
+    /* management of frames having 2 separate codestreams */
+    s->is_second_field = s->has_2_fields && !s->is_second_field && ret < avpkt->size && ret >= 0; /* next call will handle the second field */
+    s->consumed = s->is_second_field ? ret : 0;
+
     return ret;
 }
 
diff --git a/libavcodec/jpeg2000dec.h b/libavcodec/jpeg2000dec.h
index d0ca6e7a79..5ae94aafd8 100644
--- a/libavcodec/jpeg2000dec.h
+++ b/libavcodec/jpeg2000dec.h
@@ -114,6 +114,12 @@ typedef struct Jpeg2000DecoderContext {
 
     /*options parameters*/
     int             reduction_factor;
+    
+    /* field info */
+    int8_t          has_2_fields;
+    int8_t          is_bottom_coded_first;
+    int8_t          is_second_field;
+    int             consumed;
 } Jpeg2000DecoderContext;
 
 #endif //AVCODEC_JPEG2000DEC_H
diff --git a/libavcodec/pthread_frame.c b/libavcodec/pthread_frame.c
index 71e99a5728..e3a8815653 100644
--- a/libavcodec/pthread_frame.c
+++ b/libavcodec/pthread_frame.c
@@ -221,6 +221,9 @@ static attribute_align_arg void *frame_worker_thread(void *arg)
         av_frame_unref(p->frame);
         p->got_frame = 0;
         p->result = codec->cb.decode(avctx, p->frame, &p->got_frame, p->avpkt);
+        if (p->result >= 0 && avctx->frame_wrapping_field_2_supported) {
+            p->result = codec->cb.decode(avctx, p->frame, &p->got_frame, p->avpkt);
+        }
 
         if ((p->result < 0 || !p->got_frame) && p->frame->buf[0])
             av_frame_unref(p->frame);
diff --git a/libavformat/mxfdec.c b/libavformat/mxfdec.c
index e42975e7fd..af33d8cad4 100644
--- a/libavformat/mxfdec.c
+++ b/libavformat/mxfdec.c
@@ -2948,6 +2948,20 @@ static int mxf_parse_structural_metadata(MXFContext *mxf)
                 default:
                     break;
                 }
+                switch ((*essence_container_ul)[14]) {
+                case 3: /* I1: Interlaced Frame, 1 field/KLV */
+                    st->codecpar->frame_wrapping = AV_WRAPPING_FIELD_1;
+                    break;
+                case 4: /* I2: Interlaced Frame, 2 fields/KLV */
+                    st->codecpar->frame_wrapping = AV_WRAPPING_FIELD_2;
+                    break;
+                case 2: /* Cn: Clip- wrapped Picture Element */
+                case 6: /* P1: Frame- wrapped Picture Element */
+                    st->codecpar->frame_wrapping = AV_WRAPPING_FRAME;
+                    break;
+                default:
+                    break;
+                }
             }
 
             if (st->codecpar->codec_id == AV_CODEC_ID_PRORES) {
-- 
2.13.3.windows.1