[Ffmpeg-devel-irc] ffmpeg-devel.log.20190412

Sat Apr 13 03:05:04 EEST 2019

[01:32:20 CEST] <atomnuker> how many people are going to write ec code simd for dav1d before I get the clue giving it to someone else was a bad idea?
[01:33:26 CEST] <atomnuker> and the last 2 attempts are going the wrong way about it by assuming a specific cdf size
[01:34:18 CEST] <nevcairiel> optimizing the most common cases seperately is the essence of making stuff fast =p
[01:34:31 CEST] <jamrial> you should join #dav1d and mention it there
[01:35:18 CEST] <atomnuker> not when there's no need for it, my code was at least generic and overwrote/read as it saw fit as it padded all CDFs and just needed a loop for when they were big
[01:36:21 CEST] <atomnuker> actually I don't think they got big enough to need a loop at all, IIRC they fit in a single 256bit reg
[01:37:57 CEST] <atomnuker> oh well, I'll rewrite it when we port the codebase to lavc, no way I'm giving my code away to big corps under MIT
[01:39:32 CEST] <atomnuker> well, I do if they pay
[01:40:02 CEST] <BBB> <3
[01:40:39 CEST] <BBB> atomnuker: in the end it's about speed
[01:40:56 CEST] <BBB> your (our? I wrote part of that also) code didn't speed it up half as much as this code appears to do
[01:41:14 CEST] <atomnuker> it was unfinished IIRC
[01:41:28 CEST] <BBB> another issue, yes :)
[01:42:17 CEST] <atomnuker> I'll admit for small CDF sizes it didn't speed things up, it turns out avx2 is kinda too heavy
[01:43:17 CEST] <atomnuker> I think that was what I had to finish, specialcase 128bit simd for small CDFs
[01:45:07 CEST] <atomnuker> hmm, maybe even have separate functions for avx2/sse (macro'd in at runtime based on cdf size) since you don't want another branch
[10:02:25 CEST] <kurosu> it makes sense to optimize aggressively the most costly case, and then more generically the other ones
[16:58:30 CEST] <BtbN> Ghidras decompiler has very weird ways to express things. "state_byte = (uint)CONCAT11(1,(char)uVar2);" This is "movzx eax,al; or ah,1", so essentially (uVar2 & 0xFF) | 0x100.
[16:59:26 CEST] <BtbN> There is zero results about CONCAT11 in its docs for all I can tell
[17:03:23 CEST] <BtbN> This code I'm analysing is also doing highly weird things. It has an elaborate function to get the next byte from a buffer, which returns data[pos] | (pos & ~0xff). But every instance of it using that function immediately applies & 0xFF to the result. Like... what are you doing there?
[17:15:02 CEST] <ubitux> the use of these CONCAT thing is often due to bad type set on the variables/arguments
[17:15:13 CEST] <ubitux> (or the use of SIMD)
[17:15:33 CEST] <ubitux> check again the type you set
[17:19:13 CEST] <BtbN> The types look proper to me
[17:19:24 CEST] <BtbN> Looks like a compiler optimization to me
[17:20:20 CEST] <BtbN> Which optimized EAX=(EAX & 0xFF) | 0x100 to MOVZX EAX,AL; OR AH,1
[18:29:50 CEST] <philipl> BtbN: what do you think of Yogender's reply?
[18:30:39 CEST] <BtbN> I have no idea. Requiring that the input pointers stay "alive" for the whole runtime of the encoder is not _that_ unreasonable.
[18:35:03 CEST] <philipl> It seems like you'd have to go out of your way to write your custom pipeline where that's hard to achieve.
[00:00:00 CEST] --- Sat Apr 13 2019