[FFmpeg-devel] [PATCH 2/2] swscale/input: Avoid calls to av_pix_fmt_desc_get()
Michael Niedermayer
michael at niedermayer.cc
Thu Sep 8 23:25:28 EEST 2022
On Thu, Sep 08, 2022 at 09:38:51PM +0200, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > Hi
> >
> > On Thu, Sep 08, 2022 at 04:38:11AM +0200, Andreas Rheinhardt wrote:
> >> Up until now, libswscale/input.c used a macro to read
> >> an input pixel which involved a call to av_pix_fmt_desc_get()
> >> to find out whether the input pixel format is BE or LE
> >> despite this being known at compile-time (there are templates
> >> per pixfmt). Even worse, these calls are made in a loop,
> >> so that e.g. there are six calls to av_pix_fmt_desc_get()
> >> for every pair of UV pixel processed in
> >> rgb64ToUV_half_c_template().
> >>
> >> This commit modifies these macros to ensure that isBE()
> >> is evaluated at compile-time. This saved 9743B of .text
> >> for me (GCC 11.2, -O3).
> >
> > hmm, all these functions where supposed to be optimized out
> > why where they not ?
> >
> > iam asking as the code is simpler before your patch if that
> > "optimization out" thing would work
> >
>
> Why should these functions be optimized out? What would enable the
> compiler to optimize them out?
Going back into the past, there was
6b0768e2021b90215a2ab55ed427bce91d148148
before this the code certainly did get optimized out, it was just
#define isBE(x) ((x)&1)
thats simple and clean code btw
after this it became
#define isBE(x) \
+ (av_pix_fmt_descriptors[x].flags & PIX_FMT_BE)
thats still really good, and very readable, its a const array so
one would assume that a compiler can figure that out at compile time
well, i try not to think of linking and seperate objects here ;)
next it got then replaced by a function and a call that i suspect
people thought would be inlined
> (And I really don't see why this patch would make the code more
> complicated.)
the code historically was capable to lookup any flag and detail
of a pixel format at compile time
now your code works around that not working. Introducing a 2nd
system to do this in parallel. To me if i look at the evolution
of isBE() / code checking BE-ness it become more messy over time
I think it would be interresting to think about if we can make
av_pix_fmt_desc_get(compile time constant) work at compile time.
or if we maybe can return to a simpler implementation
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20220908/e10877e5/attachment.sig>
More information about the ffmpeg-devel
mailing list