[FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions

Mon Jul 17 08:18:05 EEST 2023

Le sunnuntaina 16. heinäkuuta 2023, 23.32.21 EEST Lynne a écrit :
> Introducing additional overhead in the form of a dereference is a point
> where instability can creep in. Can you guarantee that a context will
> always remain in L1D cache,

L1D is not involved here. In version 2, the pointers are cached locally.

> as opposed to just reading the raw CPU timing
> directly where that's supported.

Of course not. Raw CPU timing is subject to noise from interrupts (and 
whatever those interrupts trigger). And that's not just theoretical. I've 
experienced it and it sucks. Raw CPU timing is much noisier than Linux perf.

And because it has also been proven vastly insecure, it's been disabled on Arm 
for a long time, and is being disabled on RISC-V too now.

> > But I still argue that that is, either way, completely negligible compared
> > to the *existing* overhead. Each loop is making 4 system calls, and each
> > of those system call requires a direct call (to PLT) and an indirect
> > branch (from GOT). If you have a problem with the two additional function
> > calls, then you can't be using Linux perf in the first place.
> 
> You don't want to ever use linux perf in the first place, it's second class.

No it isn't. The interface is more involved than just reading a CSR; and sure 
I'd prefer the simple interface that RDCYCLE is all other things being equal. 
But other things are not equal. Linux perf is in fact *more* accurate by 
virtue of not *wrongly* counting other things. And it does not threaten the 
security of the entire system, so it will work inside a rented VM or an 
unprivileged process.

> I don't think it's worth changing the direct inlining we had before. You're
> not interested in whether or not the same exact code is ran between
> platforms,

Err, I am definitely interested in doing exactly that. I don't want to have to 
reconfigure and recompile the entire FFmpeg just to switch between Linux perf 
and raw cycle counter. A contrario, I *do* want to compare performance between 
vendors once the hardware is available.

> just that the code that's measuring timing is as efficient and
> low overhead as possible.

Of course not. Low overhead is irrelevant here. The measurement overhead is 
know and is subtracted. What we need is stable/reproducible overhead, and 
accurate measurements.

And that's assuming the stuff works at all. You can argue that we should use 
Arm PMU and RISC-V RDCYCLE, and that Linux perf sucks, all you want. PMU 
access will just throw a SIGILL and end the checkasm process with zero 
measurements. The rest of the industry wants to use system calls for informed 
reasons. I don't think you, or even the whole FFmpeg project, can win that 
argument against OS and CPU vendors.

-- 
Rémi Denis-Courmont
http://www.remlab.net/