[FFmpeg-devel] [PATCH 4/7] checkasm: use pointers for start/stop functions

Lynne dev at lynne.ee
Mon Jul 17 20:48:40 EEST 2023


Jul 17, 2023, 07:18 by remi at remlab.net:

> Le sunnuntaina 16. heinäkuuta 2023, 23.32.21 EEST Lynne a écrit :
>
>> Introducing additional overhead in the form of a dereference is a point
>> where instability can creep in. Can you guarantee that a context will
>> always remain in L1D cache,
>>
>
> L1D is not involved here. In version 2, the pointers are cached locally.
>
>> as opposed to just reading the raw CPU timing
>> directly where that's supported.
>>
>
> Of course not. Raw CPU timing is subject to noise from interrupts (and 
> whatever those interrupts trigger). And that's not just theoretical. I've 
> experienced it and it sucks. Raw CPU timing is much noisier than Linux perf.
>
> And because it has also been proven vastly insecure, it's been disabled on Arm 
> for a long time, and is being disabled on RISC-V too now.
>
>> > But I still argue that that is, either way, completely negligible compared
>> > to the *existing* overhead. Each loop is making 4 system calls, and each
>> > of those system call requires a direct call (to PLT) and an indirect
>> > branch (from GOT). If you have a problem with the two additional function
>> > calls, then you can't be using Linux perf in the first place.
>>
>> You don't want to ever use linux perf in the first place, it's second class.
>>
>
> No it isn't. The interface is more involved than just reading a CSR; and sure 
> I'd prefer the simple interface that RDCYCLE is all other things being equal. 
> But other things are not equal. Linux perf is in fact *more* accurate by 
> virtue of not *wrongly* counting other things. And it does not threaten the 
> security of the entire system, so it will work inside a rented VM or an 
> unprivileged process.
>

Threaten? This is a development tool first and foremost.
If anyone doesn't want to use rdcycle, they can use linux perf, it still works,
with or without the patch.


>> I don't think it's worth changing the direct inlining we had before. You're
>> not interested in whether or not the same exact code is ran between
>> platforms,
>>
>
> Err, I am definitely interested in doing exactly that. I don't want to have to 
> reconfigure and recompile the entire FFmpeg just to switch between Linux perf 
> and raw cycle counter. A contrario, I *do* want to compare performance between 
> vendors once the hardware is available.
>

That's a weak reason to compromise the accuracy of a development tool.


>> just that the code that's measuring timing is as efficient and
>> low overhead as possible.
>>
>
> Of course not. Low overhead is irrelevant here. The measurement overhead is 
> know and is subtracted. What we need is stable/reproducible overhead, and 
> accurate measurements.
>

Which is what TSC or the equivalent gets you. It's noisy, but that's because
it's better and higher accuracy than having to roundtrip through the kernel.


> And that's assuming the stuff works at all. You can argue that we should use 
> Arm PMU and RISC-V RDCYCLE, and that Linux perf sucks, all you want. PMU 
> access will just throw a SIGILL and end the checkasm process with zero 
> measurements. The rest of the industry wants to use system calls for informed 
> reasons. I don't think you, or even the whole FFmpeg project, can win that 
> argument against OS and CPU vendors.
>

Either way, I don't agree with this patch, not accepting it.


More information about the ffmpeg-devel mailing list