[FFmpeg-devel] [PATCH v10 2/5] libavcodec/webp: add support for animated WebP

Mon Feb 19 18:50:57 EET 2024

Thilo Borgmann via ffmpeg-devel:
> From: Josef Zlomek <josef at pex.com>
> 
> Fixes: 4907
> 
> Adds support for decoding of animated WebP.
> 
> The WebP decoder adds the animation related features according to the specs:
> https://developers.google.com/speed/webp/docs/riff_container#animation
> The frames of the animation may be smaller than the image canvas.
> Therefore, the frame is decoded to a temporary frame,
> then it is blended into the canvas, the canvas is copied to the output frame,
> and finally the frame is disposed from the canvas.
> 
> The output to AV_PIX_FMT_YUVA420P/AV_PIX_FMT_YUV420P is still supported.
> The background color is specified only as BGRA in the WebP file
> so it is converted to YUVA if YUV formats are output.
> 
> Signed-off-by: Josef Zlomek <josef at pex.com>
> ---
>  Changelog               |   1 +
>  libavcodec/codec_desc.c |   3 +-
>  libavcodec/version.h    |   2 +-
>  libavcodec/webp.c       | 704 +++++++++++++++++++++++++++++++++++++---
>  4 files changed, 654 insertions(+), 56 deletions(-)
> 

> +static int webp_decode_frame(AVCodecContext *avctx, AVFrame *p,
> +                             int *got_frame, AVPacket *avpkt)
> +{
> +    WebPContext *s = avctx->priv_data;
> +    AVFrame *canvas = s->canvas_frame.f;
> +    int ret;
> +    int key_frame = avpkt->flags & AV_PKT_FLAG_KEY;
> +
> +    *got_frame   = 0;
> +
> +    if (key_frame) {
> +        // The canvas is passed from one thread to another in a sequence
> +        // starting with a key frame followed by non-key frames.
> +        // The key frame reports progress 1,
> +        // the N-th non-key frame awaits progress N = s->await_progress
> +        // and reports progress N + 1.
> +        s->await_progress = 0;
> +    }
> +
> +    // reset the frame params
> +    s->anmf_flags = 0;
> +    s->width      = 0;
> +    s->height     = 0;
> +    s->pos_x      = 0;
> +    s->pos_y      = 0;
> +    s->has_alpha  = 0;
> +
> +    ret = webp_decode_frame_common(avctx, avpkt->data, avpkt->size, got_frame, key_frame);
> +    if (ret < 0)
> +        goto end;
> +
> +    if (s->vp8x_flags & VP8X_FLAG_ANIMATION) {
> +        // VP8 decoder might have changed the width and height of the frame
> +        AVFrame *frame  = s->frame;
> +        ret = av_frame_copy_props(canvas, frame);
> +        if (ret < 0)
> +            return ret;
> +
> +        ret = ff_set_dimensions(s->avctx, canvas->width, canvas->height);
> +        if (ret < 0)
> +            return ret;
> +
> +        s->avctx->pix_fmt = canvas->format;
> +    }
> +
> +    ff_thread_finish_setup(s->avctx);

1. Up until now, when decoding a series of stand-alone WebP pictures,
the multiple decoder instances don't wait for each other (because the
WebP decoder had no update_thread_context callback). You added such a
callback and now you are calling it after the main picture has already
been decoded, effectively serializing everything. You can test this for
yourself: Create lots of files via ffmpeg -i <input> -c:v libwebp -f
webp%d.webp and decode them (don't use -stream_loop on a single input
picture, as this will flush the decoder after every single picture, so
that everything is always serialized).
2. To fix this, ff_thread_finish_setup() needs to be called as soon as
possible. This means that you have to abandon the approach of letting
the inner VP8 decoder set the frame dimensions and then overwriting them
again in the WebP decoder.
3. Your WebP demuxer (from 4/5) splits an animation into a series of
packets. The problem is the design of the extended file format
(https://developers.google.com/speed/webp/docs/riff_container#extended_file_format):
Certain metadata chunks (in particular exif) that pertain to the whole
animation are stored only after the image data. If you split the input
into packets, the decoder won't be able to attach it to every frame
except the last one.
4. Due to this, splitting the input packet should be avoided. But this
causes great complications in the decoder: It now needs to output
multiple frames per packet (if said packet contained an animation). I
see three ways to do this:
a) Use the receive frame callback for this decoder. This will
necessitate changes to pthread_frame.c (which currently can't handle
receive_frame decoders) and even then it will have the further downside
that decoding a single animation is single-threaded when using
frame-threading (given that animations provide opportunity to decode
parts of a packet in parallel, they work naturally with slice threading,
not frame-threading). But non-animations should work fine as now.
b) Somehow implement this via AV_CODEC_CAP_OTHER_THREADS, potentially
via the AVExecutor API. This would have the upside that it could
dynamically switch between frame- and slice threading (depending upon
what the input is).
c) Accept packets containing whole animations, but use a BSF to split
the data so that the metadata arrives at the decoder before/together
with the real frame data. I am not sure whether this would necessitate
changes to the decode API, too (basically: if there are threads that are
currently not busy and the BSF has not signalled EAGAIN yet, then try to
extract another packet from the BSF).
Notice that the BSF I have in mind would not be a public BSF, but a
private one (given that the output of the BSF would be spec-incompliant
due to the wrong ordering it should not be public), i.e. one not
accessible via av_bsf_get_by_name() or av_bsf_iterate().

> +
> +    if (*got_frame) {
> +        if (!(s->vp8x_flags & VP8X_FLAG_ANIMATION)) {
> +            // no animation, output the decoded frame
> +            av_frame_move_ref(p, s->frame);
> +        } else {
> +            if (!key_frame) {
> +                ff_thread_await_progress(&s->canvas_frame, s->await_progress, 0);
> +
> +                ret = dispose_prev_frame_in_canvas(s);
> +                if (ret < 0)
> +                    goto end;
> +            }
> +
> +            ret = blend_frame_into_canvas(s);
> +            if (ret < 0)
> +                goto end;
> +
> +            ret = copy_canvas_to_frame(s, p, key_frame);
> +            if (ret < 0)
> +                goto end;
> +
> +            ff_thread_report_progress(&s->canvas_frame, s->await_progress + 1, 0);
> +        }
> +
> +        p->pts = avpkt->pts;
> +    }
> +
> +    ret = avpkt->size;
> +
> +end:
> +    av_frame_unref(s->frame);
> +    return ret;
>  }
>  
>  const FFCodec ff_webp_decoder = {
>      .p.name         = "webp",
>      CODEC_LONG_NAME("WebP image"),
>      .p.type         = AVMEDIA_TYPE_VIDEO,
>      .p.id           = AV_CODEC_ID_WEBP,
>      .priv_data_size = sizeof(WebPContext),
> +    UPDATE_THREAD_CONTEXT(webp_update_thread_context),
>      .init           = webp_decode_init,
>      FF_CODEC_DECODE_CB(webp_decode_frame),
>      .close          = webp_decode_close,
> +    .flush          = webp_decode_flush,
>      .p.capabilities = AV_CODEC_CAP_DR1 | AV_CODEC_CAP_FRAME_THREADS,
> -    .caps_internal  = FF_CODEC_CAP_ICC_PROFILES,
> +    .caps_internal  = FF_CODEC_CAP_ICC_PROFILES | FF_CODEC_CAP_ALLOCATE_PROGRESS,
>  };