[FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions

Sun Jul 16 23:32:21 EEST 2023

Jul 15, 2023, 22:13 by remi at remlab.net:

> Le lauantaina 15. heinäkuuta 2023, 20.43.26 EEST Lynne a écrit :
>
>> Jul 15, 2023, 10:26 by remi at remlab.net:
>> > Le lauantaina 15. heinäkuuta 2023, 11.05.51 EEST Lynne a écrit :
>> >> Jul 14, 2023, 20:29 by remi at remlab.net:
>> >> > This makes all calls to the bench start and stop functions via
>> >> > function pointers. While the primary goal is to support run-time
>> >> > selection of the performance measurement back-end in later commits,
>> >> > this has the side benefit of containing platform dependencies in to
>> >> > checkasm.c and out of checkasm.h.
>> >> > ---
>> >> > 
>> >> >  tests/checkasm/checkasm.c | 33 ++++++++++++++++++++++++++++-----
>> >> >  tests/checkasm/checkasm.h | 31 ++++---------------------------
>> >> >  2 files changed, 32 insertions(+), 32 deletions(-)
>> >> 
>> >> Not sure I agree with this commit, the overhead can be detectable,
>> >> and we have a lot of small functions with runtime a few times that
>> >> of a null function call.
>> > 
>> > I don't think the function call is ever null. The pointers are left NULL
>> > only if none of the backend initialise. But then, checkasm will bail out
>> > and exit before we try to benchmark anything anyway.
>> > 
>> > As for the real functions, they always do *something*. None of them "just
>> > return 0".
>>
>> I meant a no-op function call to measure the overhead of function
>> calls themselves, complete with all the ABI stuff.
>>
>
> I
>
>>
>> >> Can you store the function pointers out of the loop to reduce
>> >> the derefs needed?
>> > 
>> > Taking just the two loads is out of the loop should be feasible but it
>> > seems a rather vain. You will still have the overhead of the indirect
>> > function call, the function, and most importantly in the case of Linux
>> > perf and MacOS kperf, the system calls.
>> > 
>> > The only way to avoid the indirect function calls are to use IFUNC (tricky
>> > and not portable), or to make horrible macros to spawn one bench loop for
>> > each backend.
>> > 
>> > In the end, I think we should rather aim for as constant time as possible,
>> > rather than as fast as possible, so that the nop loop can estimate the
>> > benchmarking overhead as well as possible. In this respect, I think it is
>> > actually marginally better *not* to cache the function pointers in local
>> > variables, which could end up spilled on the stack, or not, depending on
>> > local compiler optimisations for any given test case.
>>
>> I disagree, uninlining the timer fetches adds another source of
>> inconsistency.
>>
>
> Err, outlining the timer makes sure that it's always the exact same code 
> that's run, and not differently optimised inlinings, at least if LTO is absent. 
> (And even with LTO, it vastly reduces the compiler's ability to optimise and 
> vary the compilation.) Again, given how the calculations are made at the 
> moment, the stability of the overhead is important, so that we can *compare* 
> measurements. The absolute value of the overhead, not so much.
>

Introducing additional overhead in the form of a dereference is a point
where instability can creep in. Can you guarantee that a context will
always remain in L1D cache, as opposed to just reading the raw CPU timing
directly where that's supported.

> But I still argue that that is, either way, completely negligible compared to 
> the *existing* overhead. Each loop is making 4 system calls, and each of those 
> system call requires a direct call (to PLT) and an indirect branch (from GOT). 
> If you have a problem with the two additional function calls, then you can't 
> be using Linux perf in the first place.
>

You don't want to ever use linux perf in the first place, it's second class.
I don't think it's worth changing the direct inlining we had before. You're not
interested in whether or not the same exact code is ran between platforms,
just that the code that's measuring timing is as efficient and low overhead
as possible.