[FFmpeg-devel] [PATCH 2/4] avcodec/get_bits: Avoid 2nd bitstream read in GET_VLC() if bits are known at build and small
Andreas Rheinhardt
andreas.rheinhardt at outlook.com
Fri Oct 27 06:10:32 EEST 2023
Michael Niedermayer:
> Signed-off-by: Michael Niedermayer <michael at niedermayer.cc>
> ---
> libavcodec/get_bits.h | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/get_bits.h b/libavcodec/get_bits.h
> index cfcf97c021c..86cea00494a 100644
> --- a/libavcodec/get_bits.h
> +++ b/libavcodec/get_bits.h
> @@ -581,8 +581,12 @@ static inline const uint8_t *align_get_bits(GetBitContext *s)
> n = table[index].len; \
> \
> if (max_depth > 1 && n < 0) { \
> - LAST_SKIP_BITS(name, gb, bits); \
> - UPDATE_CACHE(name, gb); \
> + if (av_builtin_constant_p(bits <= MIN_CACHE_BITS/2) && bits <= MIN_CACHE_BITS/2) { \
> + SKIP_BITS(name, gb, bits); \
> + } else { \
> + LAST_SKIP_BITS(name, gb, bits); \
> + UPDATE_CACHE(name, gb); \
> + } \
> \
> nb_bits = -n; \
> \
This is problematic: The GET_VLC macro does not presume that
MIN_CACHE_BITS are available; there is code that directly uses GET_VLC
instead of get_vlc2().
I had the same idea when I made my VLC patchset, yet I wanted to first
apply it (which I forgot). While investigating the above issue, I found
out that all users of GET_VLC always call UPDATE_CACHE immediately
before GET_VLC, so UPDATE_CACHE should be moved into GET_VLC;
furthermore, no user of GET_VLC relies on the reloads inside of GET_VLC.
The patches for this are here:
https://github.com/mkver/FFmpeg/commits/vlc Shall I send them?
Notice that making GET_VLC more standalone enables improvements over the
current approach; yet it will not lead to optimal code: E.g. the VLCs in
decode_alpha_block() in speedhqdec.c are so short that one could read
both VLCs with only one UPDATE_CACHE(); another example is mjpegdec.c
which currently does this:
GET_VLC(code, re, &s->gb, s->vlcs[1][ac_index].table, 9, 2);
i += ((unsigned)code) >> 4;
code &= 0xf;
if (code) {
if (code > MIN_CACHE_BITS - 16)
UPDATE_CACHE(re, &s->gb);
{
int cache = GET_CACHE(re, &s->gb);
int sign = (~cache) >> 31;
level = (NEG_USR32(sign ^ cache,code) ^ sign) - sign;
}
LAST_SKIP_BITS(re, &s->gb, code);
Because of the reloads in GET_VLC, there will always be at least
MIN_CACHE_BITS - 9 (= 16) bits available after GET_VLC, so one can read
code (<= 15) bits without updating the cache at all (16 in
MIN_CACHE_BITS - 16 is the maximum length of a VLC code used here); this
will no longer be possible with this optimization.
Btw: I am surprised that there is a branch before UPDATE_CACHE instead
of an unconditional UPDATE_CACHE. I also do not really see why this uses
these macros directly and not the functions.
Given my objection to your patch #1, magicyuv will not benefit from
this; a different approach (see
https://github.com/mkver/FFmpeg/commit/9b5a977957968c0718dea55a5b15f060ef6201dc)
is to add a get_vlc() that uses the nb of bits used to create the VLC
and a compile-time upper bound for the maximum length of a VLC code as
parameters instead of the maximum depth of the VLC.
Reading VLCs for the cached bitstream reader can btw also be improved:
https://github.com/mkver/FFmpeg/commit/fba57506a9cf6be2f4aa5eeee7b10d54729fd92a
- Andreas
More information about the ffmpeg-devel
mailing list