[Ffmpeg-devel-irc] ffmpeg-devel.log.20161008

burek burek021 at gmail.com
Sun Oct 9 03:05:02 EEST 2016


[05:38:34 CEST] <philipl> BtbN: one awkwardness with the new code. I'm not using av_hwdevice_ctx_create because I want to use a different mechanism to get the cuda device (There's a GL interop method to get the device that matches the current GL context)
[05:38:54 CEST] <philipl> This means I can't populate the cuda functions.
[05:39:39 CEST] <philipl> I put in a quick hack to populate them in cuvid_decode_init in my tree.
[05:42:54 CEST] <philipl> https://github.com/philipl/FFmpeg/commit/94af983a53d5c68e8036193f1d47514c266d4f53
[11:39:27 CEST] <cone-793> ffmpeg 03Anton Khirnov 07master:398f015f077c: avconv: buffer the packets written while the muxer is not initialized
[11:39:27 CEST] <cone-793> ffmpeg 03Hendrik Leppkes 07master:3e5e5bdfef07: Merge commit '398f015f077c6a2406deffd9e37ff34b9c7bb3bc'
[11:39:57 CEST] <nevcairiel> the next commit starts the hard part, it appears :(
[13:23:01 CEST] <BtbN> philipl, wouldn't the much more sane solution for that be adding that check to av_hwdevice_ctx_init? You still have to be calling that.
[13:34:10 CEST] <BtbN> philipl, added that to my master on github.
[15:35:06 CEST] <atomnuker> what the honest fuck
[15:35:08 CEST] <atomnuker> cglobal aac_quantize_bands, 8, 8, 8, out, in, scaled, size, Q34, is_signed, maxval, rounding
[15:35:15 CEST] <atomnuker> Q34 is a random small float value
[15:35:34 CEST] <atomnuker> why in the living fuck am I never getting anything but 0 there
[15:35:49 CEST] <atomnuker> if I mov Q34m, 0x3ff00000 everything works correctly
[15:35:50 CEST] <nevcairiel> float isnt supported in params
[15:36:00 CEST] <atomnuker> FUCK
[15:36:15 CEST] <nevcairiel> because float passing is too crazy in various calling conventions
[15:38:15 CEST] <atomnuker> then what kind of a fucked up magic does the af_volume assembly do
[15:38:33 CEST] <atomnuker> it completely omits the volume argument in UNIX64
[15:38:52 CEST] <atomnuker> and somehow that value gets magically splat'd onto a register
[15:39:10 CEST] <nevcairiel> af_volume seems to use only ints from what i can tell
[15:40:15 CEST] <nevcairiel> and the assembly looks fine
[15:40:30 CEST] <nevcairiel> it loads 4 arguments into regs
[15:41:35 CEST] <nevcairiel> and uses pshuf(d|lw) to splat the volume value
[15:45:04 CEST] <nevcairiel> jamrial: you know these qsv things we skipped .. further avconv changes kinda make those required for the qsv hwaccel to keep working, because it gets rid of the manual hackery for transcoding-hwaccels and integrates them properly :(
[15:45:33 CEST] <nevcairiel> (on that note, ffmpeg_cuvid.c will also need updating for that)
[16:09:17 CEST] <BtbN> nevcairiel, what changed? The code left in there is fairly minimal by now.
[16:09:53 CEST] <nevcairiel> cuvid_transcode_init goes away and it gets all handled in cuvid_init, or so it looks from the qsv thigns libav changed
[16:09:58 CEST] <nevcairiel> so mostly just moving a bit
[16:15:23 CEST] <philipl> BtbN: yes. it was late when i did that. you put it in the sane place.
[17:01:17 CEST] <jamrial> nevcairiel: gets we should ping Ivan Uskov then
[17:03:54 CEST] <jamrial> s/gets/guess
[17:04:03 CEST] <jamrial> he's the qsv maintainer
[17:04:19 CEST] <jamrial> he probably missed my reply the other day
[17:12:24 CEST] <nevcairiel> yeah, although i dont expect any sort of swift action
[17:33:43 CEST] <atomnuker> rcombs: AAC encoder SIMD on the ML
[17:34:09 CEST] <rcombs> hype
[17:35:01 CEST] <rcombs> patch 1/2 missing?
[17:35:55 CEST] <atomnuker> ML being slow, it's a patch to use the decoder's lcg PRNG
[17:37:52 CEST] <nevcairiel> 12% overall is quite decent for two small things
[17:37:59 CEST] <rcombs> indeed, nice
[17:38:57 CEST] <nevcairiel> not sure all compilers we use like the inline arrays there though
[17:40:19 CEST] <nevcairiel> and with like two lines extra you could make quant 32-bit compatible :)
[17:44:14 CEST] <atomnuker> how, I have 8 arguments, I could probably merge some in the wrapper function but I'm not too keen on that
[17:44:23 CEST] <nevcairiel> nah
[17:44:27 CEST] <nevcairiel> i'll put it on the ML
[17:46:34 CEST] <nevcairiel> basically you can just declare one less argument to be loaded automatically, and handle it yourself, thus saving a reg
[17:46:51 CEST] <nevcairiel> but i'll put some example code on the ML in a few minutes
[17:49:06 CEST] <rcombs> lol32bit
[17:49:25 CEST] <rcombs> (I do care a little bit though)
[17:49:47 CEST] <rcombs> 😭
[18:15:13 CEST] <jamrial> atomnuker: you can pass floats to assembly functions
[18:15:45 CEST] <jamrial> it simply depends on the arch/os. x86_32, win64 and unix64 all do it differently
[18:16:11 CEST] <rcombs> doesn't even cost a gpr
[18:16:50 CEST] <jamrial> atomnuker: take a look at for example libavutil/x86/float_dsp.asm
[18:17:44 CEST] <jamrial> the scalar functions
[18:17:52 CEST] <atomnuker> yeah, I looked at ff_vector_fmul_scalar and I couldn't figure out what happens in the UNIX64 case
[18:18:33 CEST] <atomnuker> how does the mul get to m0 in that case?
[18:18:45 CEST] <Gramner> atomnuker: see my ML reply
[18:25:57 CEST] <jamrial> ml is being slow for real. Gramner's reply is still nowhere to be seen
[18:37:30 CEST] <nevcairiel> if you follow that path, it would also fix the 32-bit path as well
[18:37:45 CEST] <nevcairiel> since float stuff is loaded from stack then
[18:38:06 CEST] <nevcairiel> (i was eating in  between, hence no mail yet)
[18:38:36 CEST] <ubitux> speaking of the ml; is there anyone moderating the queue of unregistered users?
[18:40:28 CEST] <nevcairiel> atomnuker: unix64 passes floats in xmm0-7, so it showing up in tehre is by design, fwiw
[18:42:34 CEST] <nevcairiel> win64 on the other hand always only passes 4 arguments in regs, and if those 4 are ints then even any following floats are passed through the stack, and not regs
[18:47:47 CEST] <atomnuker> so if I had 2 float args they would get passed in xmm0 and xmm1 in the unix64 case?
[18:47:54 CEST] <nevcairiel> yes
[18:47:55 CEST] <atomnuker> what about the unix32 case?
[18:48:11 CEST] <nevcairiel> you would want to move them last so they get passed through the stack
[18:49:23 CEST] <nevcairiel> cdecl probably passes all args on the stack
[18:49:51 CEST] <nevcairiel> so by having them last you can avoid x86inc messing with them and manually load them
[19:12:30 CEST] <philipl> BtbN: so the init thing works fine
[19:27:10 CEST] <atomnuker> nevcairiel: I still can't use the last register in 32 bit windows, right?
[19:29:40 CEST] <nevcairiel> last register?
[19:29:51 CEST] <nevcairiel> 32-bit windows isnt that special, it should be the same as unix32
[19:29:58 CEST] <atomnuker> r7
[19:30:38 CEST] <atomnuker> so I can't just do "cglobal aac_quantize_bands, 8, 8, 7, out, in, scaled, size, is_signed, maxval, Q34, rounding"
[19:30:47 CEST] <atomnuker> in the case of !UNIX64
[19:31:02 CEST] <Gramner> you can't use r7 on 32-bit, no
[19:31:03 CEST] <nevcairiel> you probably shouldnt do that anyway
[19:31:16 CEST] <nevcairiel> but manually load the float from the stack directly into xmm regs
[19:32:19 CEST] <nevcairiel> so you dont need the last 3 regs
[19:32:45 CEST] <atomnuker> any example where that happens?
[19:33:02 CEST] <nevcairiel> Gramner's mail on the ML ? :)
[19:33:34 CEST] <nevcairiel> vector_fmul_scalar basically does that, but with more special cases for win64, which you wouldnt even need
[19:34:49 CEST] <nevcairiel> (because in that function the float is in the first 4 args, so it get spassed through regs on win64)
[19:35:25 CEST] <atomnuker> oh, ok
[19:37:52 CEST] <nevcairiel> i wonder how other projects deal with the different calling conventions, especially when mixing floats and ints,  must be annoying everywhere
[19:39:37 CEST] <Gramner> atomnuker: also in the quantize function, you can remove the float_sign_mask and do "and is_signedd, 0x80000000" instead.
[19:44:37 CEST] <atomnuker> after shifting by 31 bits?
[19:46:02 CEST] <Gramner> yes. in the loop you just and them together anyway, might as well do it outside the loop
[19:46:53 CEST] <Gramner> oh, and s/pand/andps/ since this is SSE
[19:47:16 CEST] <atomnuker> actually it's pointless anyway
[19:47:28 CEST] <atomnuker> that mask is only to get the sign of the in[] floats
[19:47:42 CEST] <atomnuker> which is going to be in the same place as the shifted is_signed
[19:48:10 CEST] <Gramner> oh, duh. Im dumb
[19:48:45 CEST] <Gramner> yes. just remove the float_sign_mask stuff
[19:50:38 CEST] <atomnuker> sent a v2 to the ML, works on 64 and 32 bit unix
[19:51:04 CEST] <atomnuker> (but before I removed the float_sign_mask :/)
[19:52:22 CEST] <Gramner> cvtsi2ss works on memory args as well, so no need to move maxval to a gpr first
[19:52:55 CEST] <Gramner> also I realized cvttps2dq is sse2, so you need to bump it up to that
[19:55:28 CEST] <Gramner> actually doesn't x86inc warn about using cpuflag_X in a cpuflag_X+N function?
[19:56:14 CEST] <Gramner> or well, the other way around
[19:57:23 CEST] <atomnuker> I don't have any warnings
[20:04:36 CEST] <Gramner> I get this: http://pastebin.com/cjABEA6e
[20:07:47 CEST] <Gramner> ! doesn't work for negation in yasm
[21:28:44 CEST] <atomnuker> figures, I'm using nasm
[00:00:00 CEST] --- Sun Oct  9 2016


More information about the Ffmpeg-devel-irc mailing list