[FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions

Sat Jul 15 23:13:09 EEST 2023

Le lauantaina 15. heinäkuuta 2023, 20.43.26 EEST Lynne a écrit :
> Jul 15, 2023, 10:26 by remi at remlab.net:
> > Le lauantaina 15. heinäkuuta 2023, 11.05.51 EEST Lynne a écrit :
> >> Jul 14, 2023, 20:29 by remi at remlab.net:
> >> > This makes all calls to the bench start and stop functions via
> >> > function pointers. While the primary goal is to support run-time
> >> > selection of the performance measurement back-end in later commits,
> >> > this has the side benefit of containing platform dependencies in to
> >> > checkasm.c and out of checkasm.h.
> >> > ---
> >> > 
> >> >  tests/checkasm/checkasm.c | 33 ++++++++++++++++++++++++++++-----
> >> >  tests/checkasm/checkasm.h | 31 ++++---------------------------
> >> >  2 files changed, 32 insertions(+), 32 deletions(-)
> >> 
> >> Not sure I agree with this commit, the overhead can be detectable,
> >> and we have a lot of small functions with runtime a few times that
> >> of a null function call.
> > 
> > I don't think the function call is ever null. The pointers are left NULL
> > only if none of the backend initialise. But then, checkasm will bail out
> > and exit before we try to benchmark anything anyway.
> > 
> > As for the real functions, they always do *something*. None of them "just
> > return 0".
> 
> I meant a no-op function call to measure the overhead of function
> calls themselves, complete with all the ABI stuff.

I

> 
> >> Can you store the function pointers out of the loop to reduce
> >> the derefs needed?
> > 
> > Taking just the two loads is out of the loop should be feasible but it
> > seems a rather vain. You will still have the overhead of the indirect
> > function call, the function, and most importantly in the case of Linux
> > perf and MacOS kperf, the system calls.
> > 
> > The only way to avoid the indirect function calls are to use IFUNC (tricky
> > and not portable), or to make horrible macros to spawn one bench loop for
> > each backend.
> > 
> > In the end, I think we should rather aim for as constant time as possible,
> > rather than as fast as possible, so that the nop loop can estimate the
> > benchmarking overhead as well as possible. In this respect, I think it is
> > actually marginally better *not* to cache the function pointers in local
> > variables, which could end up spilled on the stack, or not, depending on
> > local compiler optimisations for any given test case.
> 
> I disagree, uninlining the timer fetches adds another source of
> inconsistency.

Err, outlining the timer makes sure that it's always the exact same code 
that's run, and not differently optimised inlinings, at least if LTO is absent. 
(And even with LTO, it vastly reduces the compiler's ability to optimise and 
vary the compilation.) Again, given how the calculations are made at the 
moment, the stability of the overhead is important, so that we can *compare* 
measurements. The absolute value of the overhead, not so much.

But I still argue that that is, either way, completely negligible compared to 
the *existing* overhead. Each loop is making 4 system calls, and each of those 
system call requires a direct call (to PLT) and an indirect branch (from GOT). 
If you have a problem with the two additional function calls, then you can't 
be using Linux perf in the first place.

-- 
レミ・デニ-クールモン
http://www.remlab.net/