[Ffmpeg-devel-irc] ffmpeg-devel.log.20170818

Sat Aug 19 03:05:02 EEST 2017

[03:59:43 CEST] <cone-808> ffmpeg 03Zhao Zhili 07master:7fb4b0368de1: ffprobe: fix use of uninitialized variable
[03:59:43 CEST] <cone-808> ffmpeg 03Jacob Trimble 07master:f4544163b276: libavformat/mov: Fix inserting frames before current_frame.
[11:40:27 CEST] <nevcairiel> always funny when people screwed themself by hiding their super-secret x264 commandlines by stripping out the metadata and now having decoding broken because their magic 4:4:4 lossless encodes were not standards conform and no metadata to identify them as such =p
[11:41:00 CEST] <nevcairiel> apparently the recent cabac fixes made decoding fail entirely now, previously it only degraded the image a bit =p
[11:41:54 CEST] <JEEB> yayifications
[11:42:03 CEST] <JEEB> and yea, version SEI removal is just funny
[11:43:26 CEST] <RiCON> nevcairiel: they'll fix it by telling people to use an older lavfilters/mpc
[11:45:19 CEST] <nevcairiel> probably
[11:45:36 CEST] <nevcairiel> someone should make a bsf filter to re-add the metadata
[11:46:14 CEST] <nevcairiel> i was going to suggest to make a bsf to fix the encodes, but thats probably not possible
[11:46:36 CEST] <wbs> maybe using jkqxz's coded bitstream framework?
[11:52:34 CEST] <wm4> there was also the thing where not feeding the first packet to the decoder would break decoding of some 4:4:4 video (or something similar), which broke seeking for me
[12:41:50 CEST] <cone-227> ffmpeg 03Michael Niedermayer 07master:c359c51947c9: avcodec/rangecoder: Do not increase the pointer beyond the buffer
[12:41:50 CEST] <cone-227> ffmpeg 03Michael Niedermayer 07master:b9f92093a102: avcodec/ffv1dec: Check for bitstream end in decode_line()
[12:41:50 CEST] <cone-227> ffmpeg 03Michael Niedermayer 07master:cadab5a2a74d: avcodec/pixlet: fixes integer overflow in read_highpass()
[13:58:46 CEST] <wm4> wbs: not sure if you know about uwp/winrt, but pseudo relocs don't work there, and ffmpeg triggers pseudo relocs with some avpriv_ variables
[13:58:59 CEST] <wm4> do you think it would make sense to replace them all with getter functions?
[13:59:10 CEST] <wbs> wm4: what exactly are pseudo relocs here?
[13:59:47 CEST] <wbs> and is this avpriv_ entries that are data variables instead of functions?
[13:59:51 CEST] <wbs> in DLL interfaces?
[14:00:06 CEST] <wbs> s/in/across/
[14:01:20 CEST] <wm4> wbs: pseudo relocs are relocations emited by gnu ld (or something) and performed by the mingw runtime when the DLL is initialized
[14:01:32 CEST] <wm4> they require making parts of the code writable temporarily
[14:01:42 CEST] <wm4> uwp prevents that
[14:01:45 CEST] <wbs> ah
[14:01:58 CEST] <wm4> (although there are weird exceptions intended for JITs, not sure if usable)
[14:02:28 CEST] <wm4> but I think ffmpeg had a strong opinion on not exporting variables via the API (wasn't the reason windows DLLs?)
[14:02:33 CEST] <wbs> I guess that happens when the variable actually is exported from a DLL, but the code using it doesn't really know (no __declspec(dllimport) etc)
[14:02:36 CEST] <wm4> just that this is ignored for avpriv symbols
[14:02:41 CEST] <wm4> yes
[14:02:44 CEST] <wbs> yes, exported data is a huge problem in DLLs
[14:03:01 CEST] <wbs> when making msvc dll support work, we got rid of as much of it as we needed in libav, to get them working
[14:03:15 CEST] <wm4> well, ffmpeg is back to 17 of such variables, lol
[14:03:29 CEST] <nevcairiel> there are no imported data variables
[14:03:36 CEST] <nevcairiel> that would just break shared builds entirely
[14:03:45 CEST] <wm4> I'm talking about stuff like avpriv_mjpeg_bits_ac_luminance
[14:04:08 CEST] <wm4> grep "extern.*avpriv_"
[14:04:24 CEST] <wbs> hm. libav has got that one as well, used from libavformat
[14:04:43 CEST] <wm4> it would work if the libs were all in one DLL
[14:05:13 CEST] <nevcairiel> it would probably also work if there was a smart macro that swaps around between dllexport and dllimport like some other projects have
[14:05:17 CEST] <nevcairiel> but alas we didnt bother with that
[14:05:29 CEST] <wbs> yeah, and it's pretty tricky when all libs are built at once
[14:05:35 CEST] <wbs> and what's internal for one is external for another
[14:06:05 CEST] Action: wbs thanks past self
[14:06:10 CEST] <wbs> d66c52c2b369401ba4face1c171ccb19130b7a31 has got a pretty good writeup
[14:06:14 CEST] <nevcairiel> hehe
[14:06:25 CEST] <wbs> in libav, there's one such variable in libavformat/rtpdec_jpeg.c
[14:06:38 CEST] <wbs> on mingw, such things are fixed up automatically. on msvc, they aren't
[14:06:41 CEST] <nevcairiel> in normal builds you just get a tiny bit of overhead from one extra deref of such not-really-imported data objects
[14:06:55 CEST] <wbs> so shared msvc builds would actually crash in libavformat/rtpdec_jpeg.c
[14:07:10 CEST] <wbs> except nobody using shared msvc builds probably have touched that piece of code, since those aren't covered by fate tests
[14:07:13 CEST] <wm4> <nevcairiel> it would probably also work if there was a smart macro that swaps around between dllexport and dllimport like some other projects have <- wouldn't be wrong for avpriv variables
[14:08:01 CEST] <nevcairiel> wbs: as long as those tables are marked av_export i would think it should work
[14:08:15 CEST] <wbs> nevcairiel: yeah, so it seems.. when I read further in my commit message essay
[14:08:41 CEST] <wbs> that's clearly one of the better commit messages I've written at least, since current self don't remember half of the tradeoffs and things we fought back then
[14:09:17 CEST] <wbs> wm4: are these cases that aren't marked with av_export?
[14:09:18 CEST] <nevcairiel> i use shared msvc builds for dev/debugging often enough, never ran into a serious issue like that
[14:09:27 CEST] <iive> am I correct to assume that these "variables" are shared constants?
[14:09:40 CEST] <wbs> nevcairiel: on the other hand, I'm pretty sure you never ran the rtpdec_jpeg code either :P
[14:09:49 CEST] <nevcairiel> nah, but its not the only one
[14:10:00 CEST] <wbs> yeah, and I think the av_export hack did work that way yeah
[14:10:08 CEST] <wm4> wbs: it's marked av_export there
[14:10:25 CEST] <nevcairiel> the problem is when building the library that actually contains the symbol
[14:10:32 CEST] <wm4> the specific one I tested (removing a use of such a symbol and then checking the size of the pseudo reloc table)
[14:10:35 CEST] <nevcairiel> because its marked as dllimport but isnt actually imported
[14:10:43 CEST] <wbs> yup
[14:10:49 CEST] <nevcairiel> it kind of works but emits a warning
[14:10:50 CEST] <wbs> and msvc handles it but moans loudly
[14:10:55 CEST] <wm4> <iive> am I correct to assume that these "variables" are shared constants? <- yes
[14:11:32 CEST] <wm4> how do you even switch between import/export
[14:11:38 CEST] <wm4> so inconvenient
[14:11:49 CEST] <wbs> you need to have something like -DBUILDING_LAVC
[14:11:55 CEST] <wbs> and a separate av_lavc_export define
[14:11:57 CEST] <nevcairiel> well most projects just have one library they build
[14:12:02 CEST] <nevcairiel> so they build that with dllexport
[14:12:11 CEST] <nevcairiel> and whoever uses the l ibrary then uses dllimport
[14:12:12 CEST] <wbs> which changes to __declspec(dllexport) for the object files that belong to lavc
[14:12:16 CEST] <nevcairiel> easy to provide with some macros
[14:12:30 CEST] <nevcairiel> more effort with 10 libraries =p
[14:12:48 CEST] <iive> there was a recent discussion into making a single ffmpeg library.
[14:13:09 CEST] <wm4> so why not just replace all of this with getters
[14:13:22 CEST] <wm4> or would such a patch be rejected because too ugly and PERFORMANCELOSS?
[14:15:04 CEST] <wbs> in most cases, for anything in e.g. libavformat, the performance requirements should be one order of magnitude less than in lavc
[14:15:23 CEST] <wbs> so for that pair, using an accessor should be fine I think. it's uglier and a lot of work though
[14:15:50 CEST] <nevcairiel> most of those tables are some sort of codec tables that the demuxers/muxers want access to
[14:16:08 CEST] <nevcairiel> iirc some of the aac tables are like that as well
[14:16:12 CEST] <nevcairiel> sample rate, channel config etc
[14:16:17 CEST] <wbs> yeah
[14:16:27 CEST] <wbs> absolutely not performance critical at least
[14:16:35 CEST] <wm4> so I'd churn out a gigantic patch that replaces all those av_export variables
[14:17:57 CEST] <wm4> it's also silly that ffmpeg needs to make code writeable just because it's built as DLL
[14:18:11 CEST] <wm4> I'm kind of surprised this didn't upset any security BS software yet
[14:24:13 CEST] <wm4> hm, might also be worth thinking about adding a single-lib build mode
[14:24:29 CEST] <wm4> but since that require dealing with all the make files, I'd rather not
[15:20:14 CEST] <RiCON> wm4: there's a thread talking about that already, iirc
[15:20:58 CEST] <RiCON> wm4: http://lists.ffmpeg.org/pipermail/ffmpeg-devel/2017-July/214213.html
[15:22:03 CEST] <wm4> oh nice
[15:23:00 CEST] <wm4> or not nice, but still
[16:34:52 CEST] <atomnuker> iive: what's left with your pvq search patches? does something more need doing?
[16:35:39 CEST] <iive> nothing
[16:35:42 CEST] <iive> afaik
[16:36:22 CEST] <iive> they've been waiting for about 10 days for somebody to approve them.
[16:40:33 CEST] <atomnuker> k, I'll push them tonight
[17:00:12 CEST] <iive> \o/
[18:12:00 CEST] <atomnuker> iive: I got v7 here, but I can't find the x86util/inc patches
[18:12:09 CEST] <atomnuker> (did you post them in another thread/do they exist?)
[18:12:21 CEST] <iive> yeh, another name
[18:12:48 CEST] <iive> https://ffmpeg.org/pipermail/ffmpeg-devel/2017-August/214630.html
[18:13:34 CEST] <iive> gmail web seems to breaks the threading if subject is changed
[18:19:12 CEST] <cone-880> ffmpeg 03Ivan Kalvachev 07master:30ae07d7ef35: Add macros to x86util.asm .
[18:19:12 CEST] <cone-880> ffmpeg 03Ivan Kalvachev 07master:7205513f8f4b: SIMD opus pvq_search implementation
[18:19:38 CEST] <atomnuker> iive: pushed, cheers
[18:20:07 CEST] <iive> now we wait, to see its FATE.
[18:20:20 CEST] Action: atomnuker starts burning that f5 key on fate.ffmpeg.org
[18:20:55 CEST] <wm4> that's... not even close to how we format commit message subject lines normally
[18:31:06 CEST] <cone-880> ffmpeg 03Rostislav Pehlivanov 07master:8e53cd1fab86: ops_pvq_search: remove dead macro
[18:31:07 CEST] <cone-880> ffmpeg 03Rostislav Pehlivanov 07master:f386dd70acdc: opus_pvq_search: only use rsqrtps approximation on CPUs with avx
[18:35:53 CEST] <atomnuker> iive: which sse4 instruction gets used currently in the sse4 version?
[18:45:02 CEST] <iive> atomnuker: blendv
[18:52:37 CEST] <atomnuker> cool, I'll leave things as they are
[18:58:45 CEST] <jamrial> why did you push it with all that dead code?
[19:00:36 CEST] <atomnuker> the presearch rounding? it was forgotten
[19:11:57 CEST] <Chloe> wm4: don't worry. We at FFmpeg are really good at consistency and quality control.
[19:14:05 CEST] <atomnuker> Chloe: why not put your self-defined expertise in consistency and quality control to work and do actual contributing discussions by reviewing patches on the ml?
[19:15:09 CEST] <Chloe> atomnuker: because we'd still have have no one agreeing on anything.
[19:16:24 CEST] <atomnuker> Chloe: you need only agree with yourself to make a remark
[19:17:38 CEST] <niklob> https://pastebin.com/6fL5KmEc "[dvbsub @ 00000000025eb520] [IMGUTILS @ 00000000079ef680] Picture size 0x0 is invalid" Wouldn't it be better to set the default to some bigger than 0x0?
[19:17:44 CEST] <wm4> and me
[19:21:57 CEST] <cone-880> ffmpeg 03Carl Eugen Hoyos 07master:285c015f1077: lavd/libdc1394: Do not crash if dc1394_camera_new() fails.
[19:22:00 CEST] <atomnuker> Chloe: I look down on those easily influenced by attempts of people trying to make the project look bad through echo-chambers
[19:23:18 CEST] <iive> atomnuker: please revert your second patch
[19:23:32 CEST] <atomnuker> its always the same, line after line: "ffmpeg never removes anything!", "ffmpeg does no efforts to improve their codebase!", "ffmpeg's standards are low"
[19:24:13 CEST] <atomnuker> iive: does a cetrain revision of phenoms from 2009 have bad divps performance?
[19:24:34 CEST] <iive> yes all of them
[19:24:44 CEST] <Chloe> Well last time I tried to remove something, there were people who supported it and then one person who didn't and then it wasn't removed.
[19:24:54 CEST] <Chloe> Like sure, some things work in FFmpeg, doesn't mean they should be in it
[19:25:17 CEST] <atomnuker> Chloe: when was that and what was it?
[19:26:08 CEST] <iive> atomnuker: There is much cleaner way to enable divps for avx code, and it is one liner
[19:26:17 CEST] <Chloe> opengl device last year
[19:26:31 CEST] <iive> atomnuker: your change obfuscates the code and makes it much harder to tweak.
[19:26:34 CEST] <Chloe> why do we need an OpenGL output device *in* ffmpeg
[19:26:39 CEST] <atomnuker> iive: its still faster than C and its an encoder
[19:26:56 CEST] <atomnuker> Chloe: because someone wanted it and contributed code for it
[19:27:47 CEST] <atomnuker> if you think its no longer useful then please state so, send another patch and carefully explain what would replace it
[19:28:04 CEST] <Chloe> It would be immediately rejected if I just said 'nothing' would replace it
[19:28:12 CEST] <JEEB> that's not true
[19:28:18 CEST] <JEEB> I mean, o9k things do opengl upload
[19:28:23 CEST] <iive> atomnuker: I'm ok with making the AVX variant use divps, however your commit message implies the opposite
[19:28:32 CEST] <wm4> ffmpeg still has the old vdpau decoding API
[19:28:46 CEST] <Chloe> JEEB: but why does that have to be *in* ffmpeg
[19:28:49 CEST] <wm4> it adds tons of special cases to all supported codecs
[19:28:59 CEST] <Chloe> the ffmpeg api is simple enough
[19:28:59 CEST] <wm4> and nothing really uses it (at least mplayer doesn't)
[19:29:00 CEST] <JEEB> yes, but what I mean is that the "what would replace it" doesn't have to be in FFmpeg
[19:29:03 CEST] <JEEB> that's my point :)
[19:29:21 CEST] <Chloe> well yes, people would write their own external code to replace it
[19:29:24 CEST] <JEEB> if it's beter to have something outside of FFmpeg, IMHO that point should be brought up
[19:29:29 CEST] <JEEB> *better
[19:29:45 CEST] <JEEB> wm4: yea that one would be a prime target if even mplayer doesn't habla it
[19:29:49 CEST] <Chloe> It's kind of silly to have opengl code like that in FFmpeg because then the user has no control over it
[19:30:24 CEST] <wm4> the worst about the opengl code is how it's hacked in into libavdevice
[19:30:36 CEST] <Gramner> atomnuker: ! for logical negation only works in nasm. use "notcpuflag(x)" instead of "!cpuflag(x)" if you want compatibility with yasm
[19:30:54 CEST] <atomnuker> iive: I'll rename the non-avx versions to _approx then
[19:30:55 CEST] <JEEB> it could be put under examples for example if someone really cares about having the code around "as an example for others"
[19:31:40 CEST] <JEEB> anyways, old VDPAU might actually be more important if it's got its tentacles around decoders
[19:31:45 CEST] <iive> atomnuker: revert this commit and send your changes to the maillist.
[19:34:32 CEST] <durandal_1707> iive: why?
[19:34:56 CEST] <iive> durandal_1707: i already explained why. i'll send a mail to document it.
[19:35:30 CEST] <Chloe> atomnuker: so you're saying times have changed and I should try again? 
[19:36:15 CEST] <JEEB> if it's been a while, it's worth it to bring out the discussion again and see how it goes
[19:39:47 CEST] <durandal_1707> no its useful like caca device
[19:40:46 CEST] <durandal_1707> ffserver is very useful
[19:41:08 CEST] <durandal_1707> same as dupe prores stuff
[19:41:19 CEST] <JEEB> what was the end result with prore encoders?
[19:41:27 CEST] <JEEB> I remember you added features to teh kostya one
[19:41:32 CEST] <JEEB> recently
[19:41:35 CEST] <durandal_1707> nobody gives a shit
[19:41:43 CEST] <JEEB> well yes
[19:43:47 CEST] <atomnuker> iive: https://0x0.st/nW9.diff
[19:44:22 CEST] <atomnuker> splits the pvq function into non-exact and exact versions and only enables the exact on have_avx_fast (which excludes slow amd stuff)
[19:44:48 CEST] <atomnuker> good?
[19:47:00 CEST] <iive> no
[19:47:56 CEST] <iive> i've sent mail to -cvs-log, with the one liner i have in mind.
[19:51:15 CEST] <jamrial> there's no need to revert anything if it can be fixed with an onliner patch...
[19:52:01 CEST] <jamrial> also, if you send something to cvs-log send it to -devel as well. not everyone follows that list
[19:52:09 CEST] <atomnuker> I don't
[19:52:20 CEST] <JEEB> reverting doesn't help anything if you already can just fix it (if you wouldn't be able to fix it quickly enough then that's a separate thing)
[19:52:47 CEST] <atomnuker> iive: what don't you like about my patch?
[19:53:58 CEST] <iive> 1. wrong commit message, it does the reverse
[19:54:23 CEST] <atomnuker> iive: I meant the diff
[19:54:52 CEST] <iive> as I said, the same thing could be done in 1 line.
[19:55:23 CEST] <iive> 2. it removes the define that explains what the different code paths does.
[19:55:31 CEST] <iive> it's still missing in your new patch.
[19:55:50 CEST] <iive> does/do.
[19:55:55 CEST] <atomnuker> fine, added it back to where PVQ_FAST_SEARCH gets substituted, good enough?
[19:56:25 CEST] <iive> atomnuker: revert your commit and apply the 1 liner.
[19:56:34 CEST] <atomnuker> no, I prefer my approach
[19:56:51 CEST] <durandal_1707> war
[19:57:06 CEST] <iive> atomnuker: you like obfuscated code?
[19:57:21 CEST] <atomnuker> iive: no, having 2 separate function names for approximated search
[19:57:39 CEST] <iive> atomnuker: and how is that better?
[19:58:37 CEST] <atomnuker> its more explicit and less obfuscated
[19:58:48 CEST] <iive> it is far more obfuscated
[19:58:56 CEST] <atomnuker> I can't even _see_ your one liner patch
[19:59:02 CEST] <iive> because you have to pass a parameter through 2 different macro templetes
[19:59:07 CEST] <iive> and you pass it as a number
[19:59:19 CEST] <iive> so once again, you have no idea what the %1 stands for...
[19:59:28 CEST] <atomnuker> iive: so you look it up
[19:59:41 CEST] <atomnuker> its better than cpuflags
[19:59:54 CEST] <iive> atomnuker: it's not obvous, and the asm code is hard enough
[19:59:55 CEST] <durandal_1707> i prefer obfuscated code always, it prevents people from touching your code
[20:00:19 CEST] <durandal_1707> exactly
[20:00:46 CEST] <iive> atomnuker: the whole purpose of that define is to be used as a switch...
[20:00:48 CEST] <atomnuker> iive: its far more obvious than not knowing cpuflags(avx) ifdeffery will still leave avx functions elsewhere
[20:01:41 CEST] <atomnuker> since cpuflags is mostly used everywhere to remove all avx/avx2 functions
[20:01:47 CEST] <atomnuker> rather than just particular ones
[20:02:04 CEST] <iive> atomnuker: i don't understand what you mean with that.
[20:02:33 CEST] <atomnuker> I meant that if I saw cpuflags(avx) in someone's asm I'll assume that no where else in the code will there be any avx functions
[20:02:37 CEST] <atomnuker> but just inside the brackets
[20:02:58 CEST] <atomnuker> *ifdef
[20:03:54 CEST] <iive> what's the point of that?
[20:05:13 CEST] <atomnuker> iive: https://0x0.st/nW9.diff
[20:05:25 CEST] <atomnuker> annotated the macros so its obvious which is which
[20:06:40 CEST] <atomnuker> iive: wrong link, sorry: https://0x0.st/nWt.diff
[20:10:47 CEST] <iive> atomnuker: let me try again. You are adding complexity so you can have different name, however there is no overlap in the functions
[20:10:59 CEST] <atomnuker> I want to have complexity here
[20:11:09 CEST] <iive> you don't have approx_avx and exact_avx
[20:11:12 CEST] <atomnuker> there needs to be a clear difference between which function is bitexact and exact
[20:12:12 CEST] <atomnuker> iive: (fma does add errors, but this is about the function itself being always bitexact with C, not occasionally)
[20:13:51 CEST] <iive> you can just add a comment
[20:14:05 CEST] <atomnuker> anyone who works on the encoder will need to know that there are 2 versions of the code, and explicitly knowing which is which is better rather than having to dig into the asm code to figure it out
[20:14:35 CEST] <iive> the approximation is better
[20:14:59 CEST] <atomnuker> no, the exact version is better because its _exact_
[20:15:02 CEST] <iive> that's why I didn't try to enable it on avx myself.
[20:15:13 CEST] <atomnuker> the approximation is a hack
[20:15:28 CEST] <atomnuker> because old cpus have crap division
[20:15:35 CEST] <iive> it produces less distortion. at least with my samples.
[20:15:57 CEST] <atomnuker> tbh I'd rather remove the approximation but I'm giving it the benefit of the doubt for people with not-so-modern CPUs
[20:16:08 CEST] <iive> not always but usually.
[20:16:57 CEST] <atomnuker> iive: its still not exact with C, and that's not what's best for an encoder under development
[20:19:58 CEST] <iive> still
[20:20:22 CEST] <iive> revert the patch, add the 1 liners, and if you insist, 
[20:20:53 CEST] <iive> send the function name suffix as parameter to the PVQ_SEARCH macro.
[20:20:59 CEST] <iive> also
[20:21:55 CEST] <atomnuker> I'm not reverting the patch, and especailly not applying the one liner which obfuscates the code even more
[20:21:58 CEST] <iive> I'm not sure that AVX_FAST()  is the macro you want to use.
[20:22:05 CEST] <atomnuker> it is, it blocks out slow amd
[20:22:27 CEST] <atomnuker> putting the define there is not at all clear when it makes a difference
[20:22:31 CEST] <iive> well, the approx avx code is faster on a bunch of amd's
[20:22:47 CEST] <atomnuker> yes, that's when it gets used
[20:23:14 CEST] <atomnuker> not on the fast ones having fast divs, on the slow ones having slow divs but fast approx
[20:24:45 CEST] <atomnuker> also depending on what happens in the future the approximation might be better off enabled on new cpus or disabled on old ones
[20:24:45 CEST] <iive> It's perfectly clear.
[20:25:16 CEST] <atomnuker> or even having 2 versions of the same instruction set but with approx and non-approx for debugging or testing
[20:25:23 CEST] <atomnuker> grouping them doesn't allow that
[20:25:28 CEST] <atomnuker> having a separate argument does
[20:25:45 CEST] <iive> you are not listening to anything I say.
[20:25:55 CEST] <iive> revert the commit and send patch to maillist.
[20:27:49 CEST] <atomnuker> no, its too slow, my patch works and I like it more, yasm is broken, no time to waste
[20:28:47 CEST] <nevcairiel> from my point of view, nasm is way more broken =p
[20:28:54 CEST] <atomnuker> if you don't like it, send a patch to the ML to do what you want it to do and I'll review it
[20:29:06 CEST] <iive> i was developing on yasm, if it is broken, then you have broken it.
[20:29:32 CEST] <iive> atomnuker: did you read my email?
[20:31:34 CEST] <Gramner> nevcairiel: if you have actual reproducible nasm bugs, just file some bug reports. issues actually gets fixed in nasm
[20:33:44 CEST] <cone-880> ffmpeg 03Rostislav Pehlivanov 07master:3c99523a2864: opus_pvq_search: split functions into exactness and only use the exact if its faster
[20:34:03 CEST] <atomnuker> iive: yes, I have broken it making the code better, and now I have fixed it, with a solution that I like even more
[20:35:14 CEST] <nevcairiel> Gramner: i doubt they're going to like "build ffmpeg on windows, try to link with msvc", a minimized reproduction case is hard
[20:41:24 CEST] <iive> atomnuker: are you drunk?
[20:43:20 CEST] <iive> i cannot help but read the above line in Rick's voice.
[20:47:14 CEST] <atomnuker> no idea who rick is, but he's probably not like me at all atm because I'm sober and angry arguing about something as trivial as this
[20:47:26 CEST] <atomnuker> old CPUs suck and I dread the day they'll be considerd retro
[20:48:20 CEST] <atomnuker> the Ghz wars wasted so many years for marginal gains
[20:49:41 CEST] <Gramner> now we get re-releases and re-re-releases of the same cpus year after year instead
[20:50:32 CEST] <durandal_1707> should we raise max threads define?
[20:50:33 CEST] <iive> i do recommend you to watch "Rick and Morty". Rick is genous that can literally do everything.
[20:50:49 CEST] <atomnuker> kaby lake vs skylake was particularly bad, a whole rerelease for a vp9 decoder and occasional gains
[20:51:21 CEST] <atomnuker> now I don't know what is better - kaby lake x or skylake x
[20:51:48 CEST] <atomnuker> I thought kaby lake x is better because its kaby lake with an x so its better than kaby lake which was newer than skylake
[20:52:27 CEST] <atomnuker> but apparently no, kaby lake x is the budged form of skylake x
[20:52:43 CEST] <nevcairiel> thats not the case though
[20:53:06 CEST] <atomnuker> its the opposite?
[20:53:08 CEST] <Gramner> KBL-X is such a crazy product. I have no idea how the hell that thing managed to get approved. it literally requires motherboards to have dual VRM setups in order to support FIVR and non-FIVR CPUs at the same time
[20:53:08 CEST] <nevcairiel> kaby lake x is literally kaby lake on a different socket, basically same cpus with  better power delivery
[20:53:33 CEST] <nevcairiel> its not really related to skylake-x other then running on the same platform
[20:54:15 CEST] <Gramner> it's shoehorned into the same platform in a super ugly and complex way
[20:54:45 CEST] <atomnuker> so kabylake x doesn't have avx512 and kaby lake had such bad power dissipation it required a new socket?
[20:54:53 CEST] <Gramner> I'm surprised it actually works without blowing up half the time
[20:55:35 CEST] <Gramner> now it only blows up if you switch between KBL-X and SKL-X cpus on the smae motherboard without taking special precautions
[20:55:54 CEST] <nevcairiel> afaik they fixed that
[21:08:54 CEST] <iive> atomnuker: i really don't understand what was the hurry in your commits.
[21:09:21 CEST] <iive> it's not like you are going to miss the release window or something.
[21:09:29 CEST] <atomnuker> one should not leave yasm broken
[21:09:35 CEST] <iive> you broke it
[21:10:08 CEST] <iive> and I do mean both of these commits.
[22:01:42 CEST] <paveldimow> Hi, anyone interested in adding support for amf0? I would like to discuss this in private is possible. Tnx
[22:06:52 CEST] <Compn> paveldimow : durandal_1707 might be 
[22:07:24 CEST] <Compn> which amf0 are we talking about ?
[22:07:25 CEST] <Compn> flv ?
[22:07:26 CEST] <Compn> or mp4 ?
[22:07:41 CEST] <paveldimow> well it's amf0 in mp4 container
[22:08:41 CEST] <paveldimow> I am not sure if he is interested since he asked "new adobe bullshit?" :D
[22:08:59 CEST] <Compn> haha :D
[22:09:12 CEST] <JEEB> > amf0 in mp4
[22:09:18 CEST] <JEEB> how the REDACTED did they manage that
[22:09:29 CEST] <JEEB> esp. if it's over RTMP because RTMP is FLV
[22:11:13 CEST] <paveldimow> well all I know is that client connect via rtmp like protocol to server and start brodcast the audio/video + metadata which are actually timestamps every 60 frames 
[22:11:45 CEST] <paveldimow> at the end I have a mp4 file with one video one audio and one amf0 stream
[22:13:27 CEST] <jamrial> Gramner: looks like cannon lake is going to be mobile only due to low yields, which means avx512 consumer outside of skl-x will not be a thing until ice lake
[22:14:32 CEST] <nevcairiel> more like making a high-performance core more efficient then their fully optimized 14nm process takes a bit more optimization
[22:14:46 CEST] <nevcairiel> and early 10nm process isnt there
[22:14:56 CEST] <jamrial> s/consumer/desktop
[22:19:08 CEST] <Gramner> 14nm++ is better than 10nm for high power chips but yields are also apparently not great
[22:20:37 CEST] <Gramner> afaik CNL will only have a single avx-512 ALU too. OTOH it might get away with very low avx frequency offsets due to the clock rate being low in the first place
[22:22:47 CEST] <atomnuker> is it time for gallium arsenide?
[22:23:37 CEST] <Gramner> dunno. III-V has been in the labs a long time
[22:25:14 CEST] <Gramner> InGaAs has good properties
[22:25:22 CEST] <TD-Linux> wait does avx-512 support integers < 32 bits
[22:25:57 CEST] <Gramner> avx-512 is a collection of 2^82 different instruction set extensions
[22:26:10 CEST] <nevcairiel> afaik with the proper extensions (which skl-x has), it goes down to byte level like all others
[22:26:15 CEST] <Gramner> but the subset available on "normal" CPU:s supports that, yes
[22:26:47 CEST] <Gramner> https://pbs.twimg.com/media/DBPfgewWsAEVkiA.jpg:large
[22:27:23 CEST] <TD-Linux> ah I need the "BW" feature
[22:28:04 CEST] <Gramner> now why they couldn't even bother grouping together the related types introduced at the same time beats me
[22:28:34 CEST] <Gramner> e.g. CD should've been part of F. BW+DQ+VL should've been a single flag etc.
[22:30:14 CEST] <Gramner> I just threw F+CD+BW+DQ+VL under the "avx512" cpuflag in x86inc. nobody going to bother checking for every possible permutation anyways. and xeon phi is kind of irrelevant for the general multimedia use case
[22:33:22 CEST] <J_Darnley> Gramner: That venn diagram makes it look like a real shitshow.
[22:34:01 CEST] <nevcairiel> just ignore the Phi side and its like any other instruction set
[22:34:10 CEST] <atomnuker> Gramner: if you don't use 512bit regs but say 256bit regs can you perform 4x256bit mults or whatever?
[22:34:10 CEST] <Gramner> to be fair, you can basically ignore the entire right side
[22:34:38 CEST] <atomnuker> (can the units split and act independently?)
[22:34:39 CEST] <Gramner> atomnuker: no
[22:35:41 CEST] <atomnuker> that's a shame, 2 of the same operations and you've saturated them
[22:36:01 CEST] <Gramner> there are two generic 256-bit simd ALU:s on p0 and p1. they are fused to form a single 512-bit ALU on SKL-X. so those aren't actully any wider than before. p5 is the only execution port that's wider
[22:36:13 CEST] <Gramner> p5 is native 512-bit
[22:36:32 CEST] <jamrial> and as such fast zmm shuffles?
[22:36:40 CEST] <Gramner> p2/p3/p4 are also native 512-bit (so you can do 2x64 byte loads and 1x64 byte store per cycle)
[22:37:08 CEST] <Gramner> in-lane zmm shuffles are 1/1
[22:37:19 CEST] <Gramner> cross-lane dword and qword 3/1
[22:37:28 CEST] <nevcairiel> does it have 4 lanes now? <.<
[22:37:42 CEST] <Gramner> cross-lane word shuffles (e.g. vpermw) are slow. 6/2 iirc
[22:37:45 CEST] <Gramner> 4 lanes yes
[22:37:53 CEST] <Gramner> but many new amazing shuffle instructions
[22:38:00 CEST] <Gramner> so much less of an issue now
[22:39:03 CEST] <Gramner> vperm(i|t)2(b|w|d|q) are super useful. the byte version is only in CNL though
[22:40:01 CEST] <Gramner> most cross-lane shuffling avx-512 code I've written so far has been in the frequency domain so word shuffles were sufficient
[22:45:15 CEST] <TD-Linux> so for the models that have only 1 avx-512 fma, does that mean that p5 is just 256 wide?
[22:45:30 CEST] <Gramner> no, that's always 512-bit
[22:45:45 CEST] <TD-Linux> does it just drop fma? or do p0/p1 drop fma
[22:45:50 CEST] <Gramner> on the 1fma it can only do shuffles. on 2fma it can do either shuffles or arith
[22:46:10 CEST] <nevcairiel> didnt someone claim that tests have shown that actually all models have the full port
[22:46:40 CEST] <TD-Linux> so it's dropping more than just fma
[22:47:12 CEST] <Gramner> "fma" is a stupid term in this case since fma ops is just a small subset of instructions it can execute
[22:47:29 CEST] <TD-Linux> yeah I see it used all over the place but "alu" seems like it would be more appropiate
[22:47:31 CEST] <nevcairiel> do those "fma" units also handle integer instructions?
[22:47:37 CEST] <nevcairiel> or is that something else?
[22:47:46 CEST] <Gramner> yes
[22:47:57 CEST] <TD-Linux> hopefully agner's guide is updated soon
[22:48:15 CEST] <Gramner> everything that's not a shuffle or otherwise very complex
[22:49:24 CEST] <Gramner> basically everything that both p0 and p1 can do in 256-bit p5 can do in 512-bit on the "2FMA" models
[22:50:02 CEST] <Gramner> so arithmetic, logic, shifts, and whatnot
[22:50:37 CEST] <atomnuker> I think having more ALUs and more shuffle instructions might have been better than avx512
[22:51:04 CEST] <Gramner> then you're going to get bottlenecked by decoding probably
[22:51:27 CEST] <atomnuker> hm, good point
[22:51:35 CEST] <Gramner> if you can't decode instructions fast enough to keep the execution units busy it's pointless
[22:52:15 CEST] <Gramner> and that's a serial process (due to variable-length ops) and quite power hungry too
[22:53:30 CEST] <Gramner> but the execution units on SKL-X are very unbalanced. everything is thrown into p5
[22:59:26 CEST] <Gramner> avx offsets are a big issue. they are sort-of designed around being able to handle "power viruses" that max out every execution unit every cycle which punishes real-world code more than neccessary. it really needs more granularity than a simple "downclock everything a few hundred MHz as soon as a zmm reg is touched" to get any widespread adaption
[23:18:22 CEST] <nevcairiel> i tuned my avx offset to running something like linpack which absolutely melts the cpu, but if i dont the system just shuts down, which is not something I like ever happening. but just for x264 for example i could leave it much much higher
[23:18:32 CEST] <nevcairiel> of course server cpus dont really get the benefit of tuning such things
[23:19:38 CEST] <atomnuker> you can adjust that?
[23:19:42 CEST] <atomnuker> in the bios?
[23:19:46 CEST] <nevcairiel> yes
[23:19:47 CEST] <Gramner> yes, which is why it should be a dynamic thing that gradually drops the frequency as needed instead of a static value
[23:19:49 CEST] <BtbN> On most consumer-PCs, yes
[23:19:54 CEST] <BtbN> But not on server-boards
[23:20:05 CEST] <atomnuker> and if you leave it on max with the stock coolet it'll overheat?
[23:20:18 CEST] <atomnuker> *cooler
[23:20:21 CEST] <nevcairiel> it'll probably just shutdown due to various protections
[23:20:42 CEST] <BtbN> it's almost impossible to kill a modern CPU via heat
[23:20:42 CEST] <atomnuker> what has happened to desktop CPUs while I've used laptops?
[23:20:44 CEST] <nevcairiel> my PC just hard-resets if i try to run something really powerful like Linpack
[23:20:57 CEST] <nevcairiel> without proper offset
[23:21:02 CEST] <BtbN> I'm currently playing the AMD support game.
[23:21:12 CEST] <BtbN> They told me a VCore voltage range to try for system stability.
[23:21:25 CEST] <BtbN> And I was like "dudes, my BIOS Auto setting is higher then your max in that range"
[23:21:38 CEST] <BtbN> and it's still not stable
[23:21:59 CEST] <BtbN> But I just did the tests, found it being even less stable, and reported back
[23:22:08 CEST] <BtbN> wonder how long they'll keep trying until I get a new CPU
[23:22:13 CEST] <atomnuker> this is horrible, so you need liquid cooling even if you don't clock your cpu to get the most out of it?
[23:22:17 CEST] <J_Darnley> Clearly they've run out of stock.
[23:22:18 CEST] <Gramner> you never go auto voltage when oc:ing
[23:22:28 CEST] <BtbN> I'm not OCing anything
[23:22:36 CEST] <Gramner> ah
[23:22:40 CEST] <BtbN> Just broken Ryzen
[23:22:48 CEST] <nevcairiel> atomnuker: the default offsets are very conservative, if you change them thats "OC" =p
[23:23:29 CEST] <atomnuker> grr, bloody newfangled avx offset stuff
[23:23:42 CEST] <BtbN> Ryzen doesn't have it!
[23:23:51 CEST] <nevcairiel> ryzen barely  has avx2 as it is
[23:23:57 CEST] <Gramner> ryzen only has 128-bit SIMD
[23:24:11 CEST] <BtbN> Well, but as it doesn't clock down, it is actually competitive on avc2 on servers
[23:24:12 CEST] <atomnuker> Gramner: but it has more units
[23:24:12 CEST] <Gramner> avx2 is emulated as two 128-bit ops
[23:24:22 CEST] <BtbN> *avx2
[23:25:17 CEST] <nevcairiel> anyhow, as a Ryzen buyer I would be miffed if I read something like "the top binned Zen Dies went into ThreadRipper", basically that means if you buy Ryzen you get the leftovers? :p
[23:25:55 CEST] <atomnuker> threadripper has numa, I'd rather avoid it
[23:26:12 CEST] <BtbN> It doesn't. Just the same CCX design Ryzen has as well.
[23:26:17 CEST] <BtbN> Not actual NUMA
[23:26:24 CEST] <nevcairiel> threadripper has two actual dies
[23:26:35 CEST] <nevcairiel> the latency between those is far higher then between two CCX on the same die
[23:26:47 CEST] <nevcairiel> there is a reason why AMD has this game mode that disables one die
[23:26:57 CEST] <iive> Gramner: avx2 is very easy to be emulated as two 128bit ops, because it is basically that
[23:27:48 CEST] <Gramner> sure. but on amd it's not really any faster aside from a bit reduced decoder pressure
[23:28:07 CEST] <Gramner> using avx2 on amd makes stuff slower in some cases
[23:29:00 CEST] <iive> i've seen this happen on intel too ;)
[23:29:24 CEST] <BtbN> If the CPU clocks itself back into the 90s because of avx2 load, yeah
[23:30:48 CEST] <iive> Gramner: my point is, that not the emulation part is bad, but not having extra units for execution.
[23:31:21 CEST] <nevcairiel> thats not the reason, the reason is people using avx2 on algorithms that dont really scale well, if its only a few percent faster then its not going to help with the possibility of a downclock, but if it scales properly then a 10%-15% downclock won't make it "slower"
[23:32:02 CEST] <nevcairiel> (on that note, latest cpus dont downclock that much for avx2 anymore, avx512 on the other hand)
[23:32:27 CEST] <iive> how long does it take for cpu to downclock avx2 units?
[23:32:36 CEST] <Gramner> die shrinks to the rescue with lower per-transistor capacitancy? I hope
[23:33:15 CEST] <nevcairiel> skylake got much faster with clock changes, a few milliseconds or so
[23:33:27 CEST] <nevcairiel> i forgot the marketing name for their faster clock changes
[23:33:31 CEST] <nevcairiel> speed-something
[23:33:50 CEST] <nevcairiel> speed shift? that may be it
[23:34:53 CEST] <iive> scale?
[23:35:32 CEST] <atomnuker> turbo boost?
[23:35:59 CEST] <atomnuker> wait, that was another thing
[23:36:25 CEST] <Gramner> 56kc to power up ymm registers (during which time avx code is very slow), after not using avx for 2.7Mc it's powered down again (on SKL-S, dunno if the values are different on SKL-X)
[23:39:46 CEST] <Gramner> I actually have no idea why it takes over 50.000 clock cycles to power up some more transistors, but I'm not a hw designer
[23:40:19 CEST] <iive> avx code is very slow... how does it execute at all, if it doesn't use emulation?
[23:41:46 CEST] <BtbN> 50000 clock cycles isn't all that much time at 3GHz+. If they need to physically charge up or something
[23:41:50 CEST] <Gramner> like 4-5x slower. probably emulated with 128-bit ops in a non-optimal way due to the expectation that applications that actually make use of AVX will stay powered up all the time so not worth optimizing it to the ideal ~2x
[23:59:32 CEST] <Gramner> huh, I just realized that all the brand new SHA instructions are using legacy encoding only, not VEX. including all the good old beloved features such as hidden implicit registers. just... why?
[00:00:00 CEST] --- Sat Aug 19 2017