[FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small

Michael Niedermayer michael at niedermayer.cc
Fri Oct 27 21:38:14 EEST 2023


On Fri, Oct 27, 2023 at 05:10:32AM +0200, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> > ---
> >  libavcodec/get_bits.h | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
> > index cfcf97c021c..86cea00494a 100644
> > --- a/libavcodec/get_bits.h
> > +++ b/libavcodec/get_bits.h
> > @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s)
> >          n     = table[index].len;                               \
> >                                                                  \
> >          if (max_depth > 1 && n < 0) {                           \
> > -            LAST_SKIP_BITS(name, gb, bits);                     \
> > -            UPDATE_CACHE(name, gb);                             \
> > +            if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \
> > +                SKIP_BITS(name, gb, bits);                      \
> > +            } else {                                            \
> > +                LAST_SKIP_BITS(name, gb, bits);                 \
> > +                UPDATE_CACHE(name, gb);                         \
> > +            }                                                   \
> >                                                                  \
> >              nb_bits = -n;                                       \
> >                                                                  \
> 
> This is problematic: The GET_VLC macro does not presume that
> MIN_CACHE_BITS are available; there is code that directly uses GET_VLC
> instead of get_vlc2().
> 
> I had the same idea when I made my VLC patchset, yet I wanted to first
> apply it (which I forgot). While investigating the above issue, I found
> out that all users of GET_VLC always call UPDATE_CACHE immediately
> before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC;
> furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC.
> The patches for this are here:
> https://github.com/mkver/FFmpeg/commits/vlc Shall I send them?
> 
> Notice that making GET_VLC more standalone enables improvements over the
> current approach; yet it will not lead to optimal code: E.g. the VLCs in
> decode_alpha_block() in speedhqdec.c are so short that one could read
> both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c
> which currently does this:
> 
>         GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2);
> 
>         i += ((unsigned)code) >> 4;
>             code &= 0xf;
>         if (code) {
>             if (code > MIN_CACHE_BITS - 16)
>                 UPDATE_CACHE(re, &s->gb);
> 
>             {
>                 int cache = GET_CACHE(re, &s->gb);
>                 int sign  = (~cache) >> 31;
>                 level     = (NEG_USR32(sign ^ cache,code) ^ sign) - sign;
>             }
> 
>             LAST_SKIP_BITS(re, &s->gb, code);
> 
> Because of the reloads in GET_VLC, there will always be at least
> MIN_CACHE_BITS - 9 (= 16) bits available after GET_VLC, so one can read
> code (<= 15) bits without updating the cache at all (16 in
> MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this
> will no longer be possible with this optimization.
> Btw: I am surprised that there is a branch before UPDATE_CACHE instead
> of an unconditional UPDATE_CACHE. I also do not really see why this uses
> these macros directly and not the functions.
> 
> Given my objection to your patch #1, magicyuv will not benefit from
> this; a different approach (see
> https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc)
> is to add a get_vlc() that uses the nb of bits used to create the VLC
> and a compile-time upper bound for the maximum length of a VLC code as
> parameters instead of the maximum depth of the VLC.
> 

> Reading VLCs for the cached bitstream reader can btw also be improved:
> https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d54729fd92a

speaking of that, i was wondering if the alternatives we had in get_bits.h
A32_BITSTREAM_READER wouldnt be worth reinvestigating
especially when extended to 64bit some of these readers might perform better
There are then just more bits available and fewer reads and fewer mispredicted
branches for cached ones

It would be somewhat nice if we could avoid having 2 different APIs as we have
now with the cached and normal reader.
Also the normal one with 64bit would be interresting, more available bits
so fewer reads

also i was wondering about a vlc reader thats entirely free of conditional
branches. Just a loop that in each iteration would step by 0-n symbols
forward and update a pointer to which table to use next
but i dont think i will have the time to try to implement this.

I have alot more ideas than i have time to try sadly, if you wan/can/or did
try any of above that would be interresting

thx

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

"You are 36 times more likely to die in a bathtub than at the hands of a
terrorist. Also, you are 2.5 times more likely to become a president and
2 times more likely to become an astronaut, than to die in a terrorist
attack." -- Thoughty2

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20231027/b8aeaa5e/attachment.sig>


More information about the ffmpeg-devel mailing list