[Ffmpeg-devel-irc] ffmpeg-devel.log.20170714

Sat Jul 15 03:05:02 EEST 2017

[00:49:03 CEST] <kierank> durandal_170: give foo86 commit
[00:52:33 CEST] <jamrial> kierank: if you reviewed his dolby_e or dts patchsets, then push them
[00:53:36 CEST] <peloverde> durandal_170: Any update on aac960?
[02:34:12 CEST] <cone-915> ffmpeg 03Kaustubh Raste 07master:df806605f7b7: avcodec: Add prefetch for mips
[02:34:12 CEST] <cone-915> ffmpeg 03Aleksandr Slobodeniuk 07master:390e028c663b: avutil/threadmessage: fix error return in case of av_fifo_alloc failure
[04:53:15 CEST] <amey> [ashowinfo_hanning @ 0x328b8e0] n:90 pts:16427 pts_time:2.98022 pos:-1 fmt:fltp channels:1 chlayout:mono rate:5512 nb_samples:2048 checksum:B8DA7FD0 plane_checksums: [ B8DA7FD0 ]
[04:53:15 CEST] <amey> Above is description of frame as shown by ashowinfo filter. I want to use samples somewhere else. I am not able to figure out how to get all 2048 float samples out of frames. When I typecast frame->data field to float I only get 64 valid values which is backed by frame->linesize[0] = 256 bytes, thus 256/4 = 64 samples. What is relation between nb_samples and sample format and how to get all 2048 samples out of
[04:53:15 CEST] <amey> frame?
[05:02:39 CEST] <amey> Or there is possibility of frame itself being corrupt. Saying 2048 samples but only having 64?
[05:33:27 CEST] <beauty> how to know ts stream's bitrate?
[05:36:46 CEST] <beauty> ??
[05:57:42 CEST] <cone-759> ffmpeg 03Rostislav Pehlivanov 07master:91b27b83939a: opusenc: use float_dsp for non-transient windowing
[07:57:38 CEST] <LongChair> jkqxz: i'm back, was wondering if you could gimme a patch file with your latest hwcontext_drm changes, or even better a link to the GH repo / branch where you would have those so that i can update the rockchip decoder :)
[10:20:31 CEST] <durandal_1707> should i add long double to swresample as sample format?
[10:38:48 CEST] <wm4> would anyone insist on still keeping VDA?
[10:39:17 CEST] <wm4> because I'm at a point where I'm fixing videotoolbox, and it would require fixes to VDA code as well, which I'm not going to touch
[11:17:43 CEST] <wm4> how do you make "make install" not strip symbols again?
[11:21:07 CEST] <nevcairiel> configure with disable-stripping?
[11:22:53 CEST] <wm4> sigh
[11:22:56 CEST] <nevcairiel> but maybe that doesnt even help
[11:22:57 CEST] Action: wm4 rebuilds
[11:22:58 CEST] <nevcairiel> who knows
[11:23:13 CEST] <wm4> stupid crap
[11:23:20 CEST] <wm4> why make life harder for no reasomn
[11:23:41 CEST] <nevcairiel> should've adopted the autotools way and offer  make install and make install-strip
[11:29:22 CEST] <wm4> and of course it'll still be built with optimizations
[12:44:51 CEST] <jkqxz> LongChair:  <http://ixia.jkqxz.net/~mrt/ffmpeg/drm/0001-lavu-Add-DRM-hwcontext.patch> (rebased again).
[12:45:24 CEST] <kierank> durandal_1707: for super audiophile?
[12:45:34 CEST] <durandal_1707> peloverde: post patch to ml
[12:45:53 CEST] <durandal_1707> kierank: no, for atomnuker 
[12:47:04 CEST] <LongChair> jkqxz: thanks
[12:52:19 CEST] <BtbN> wow, VDPAU is alive
[13:17:47 CEST] <philipl> BtbN: They obviously finally hired someone.
[13:48:27 CEST] <thebombzen> wm4: I believe --enable-debug=3 disables opt and stripping
[14:49:26 CEST] <cone-527> ffmpeg 03Derek Buitenhuis 07master:e10c31f3316a: hdsenc: Remove dead store
[15:10:14 CEST] <Compn> anyone tried the libfuzzer-gv yet ?
[18:29:04 CEST] <atomnuker> is the last git commit hash defined anywhere?
[18:29:13 CEST] <atomnuker> as a macro
[18:31:15 CEST] <J_Darnley> perhaps in FFMPEG_VERSION
[19:10:09 CEST] <BBB> would people like an updated version of an x86inc.asm tutorial?
[19:10:22 CEST] <BBB> the one from x264 is kind of prehistorical
[19:11:12 CEST] <kierank> yes
[19:11:23 CEST] <kierank> maybe I can allocate some time for J_Darnley to work on this with you BBB 
[19:11:42 CEST] <kierank> unless you want to do it alone
[19:12:14 CEST] <BBB> I already have one :-p
[19:12:18 CEST] <BBB> Im considering making it public
[19:12:24 CEST] <BBB> (need to clean it up and sanitize it a little)
[19:12:39 CEST] <BBB> (basically remove any references to internal materials / symbols etc.)
[19:12:50 CEST] <BBB> but Id appreciate some review from you or J_Darnley if you guys have time
[19:13:02 CEST] <BBB> Ill probably put it on my blog, but also in the wiki or so
[19:13:35 CEST] <BBB> its not intended to teach people how to write assembly, i.e. it assumes you know what assembly is and how it works, it really only teaches the x86inc.asm-specific pieces
[19:13:57 CEST] <J_Darnley> Sure.  Send me a link privately if you want me to check it out before hand.
[19:14:06 CEST] <BBB> cool
[19:14:18 CEST] <BBB> hm, in fact
[19:14:28 CEST] <BBB> I bet you guys would want to extend it to cover things like floating point etc.
[19:14:47 CEST] <BBB> (how does argument passing work across platforms, return values, etc.)
[19:15:54 CEST] <J_Darnley> I'm sure others would be more interested in that.  I still haven't done any fp in assembly
[19:16:00 CEST] <BBB> ok
[19:16:23 CEST] <kierank> BBB: my issues is I know how to use x264asm at a basic level (i.e the old x264 tutorial) but I don't understand all the crazy shit you guys do 
[19:16:35 CEST] <BBB> :D
[19:16:47 CEST] <BBB> Im wondering now what crazy shit youre refering to
[19:17:03 CEST] <tdjones> I would really appreciate some asm material. They teach it at a very basic level in uni usually, but I can't really follow much of the asm I see on the ML either
[19:17:08 CEST] <kierank> for example why and when to use the stack or red zone isn't covered in the old tutorial
[19:17:25 CEST] <kierank> floating point as you say
[19:19:05 CEST] <BBB> my tutorial covers stack memory
[19:19:13 CEST] <BBB> but like I said, no floating point material
[19:19:25 CEST] <BBB> tdjones: ok, cool
[19:19:28 CEST] <BBB> let me spend some time on this
[19:19:39 CEST] <BBB> well just note this down as another beer owed by kierank
[19:19:44 CEST] <kierank> ahahaha
[19:19:47 CEST] <BBB> Ill get so drunk at VDD this year
[19:19:51 CEST] <BBB> yikes
[19:19:58 CEST] <atomnuker> yep, the only way to learn asm now is to be taught by someone
[19:20:17 CEST] <BBB> atomnuker: what would you want to see in a tutorial?
[19:20:20 CEST] <kierank> atomnuker: no the best way to learn asm is to write code
[19:20:33 CEST] <atomnuker> actually yes
[19:22:04 CEST] <BBB> I currently have x86inc.asm example, then INIT_*, cglobal, DEFINE_ARGS and RET (this is the longest section), then register type templating (mmx/sse/avx), then instruction set templating (sse2/ssse3), then AVX emulation (3-operand instructions), then SWAP, then stack memory
[19:23:02 CEST] <kierank> heh, it's almost like wikibooks
[19:23:06 CEST] <kierank> than a blog post
[19:23:29 CEST] <BBB> its intended to be thorough :-p
[19:24:08 CEST] <BBB> the portion that explains cglobal is several pages of text :-o
[19:24:20 CEST] <BBB> even though its just one stupid line of text (code)
[19:25:34 CEST] <kierank> I wonder if we can add exercised and stuff
[19:25:38 CEST] <kierank> so it becomes like a textbook
[19:26:48 CEST] <BBB> :D
[19:26:49 CEST] <BBB> brb
[19:31:06 CEST] <tdjones> Does anybody have recommendations for modern x86 books/resources? I struggle to find any books that cover more than the bare minimum.
[19:31:38 CEST] <J_Darnley> Not outside the Intel reference and Agner's timing tables, no.
[19:33:01 CEST] <kierank> atomnuker: does av1 use x264asm?
[19:33:51 CEST] <atomnuker> yes, though lately there's been a lot of intrinsics too
[19:34:00 CEST] <kierank> eugh
[19:34:24 CEST] <kierank> intrinsics is to asm as c++ is to c
[19:34:33 CEST] <kierank> people think they are saving time but in fact they are making a mess
[19:35:08 CEST] <atomnuker> the way av1 deals with simd is a mess
[19:35:15 CEST] <atomnuker> no function pointers like we do
[19:35:32 CEST] <kierank> we are the exception instead of the rule afaik
[19:35:35 CEST] <atomnuker> instead global function pointer variables which you can't seem to chase
[19:35:50 CEST] <atomnuker> also they don't template SIMD code
[19:35:59 CEST] <kierank> from reverse engineering stuff people just seem to shove their simd whenever they want
[19:36:04 CEST] <atomnuker> every file is for a different instruction set, e.g. sse, avx, avx2
[19:36:08 CEST] <kierank> eugh
[19:36:11 CEST] <kierank> that is disgusting
[19:36:37 CEST] <atomnuker> at least its not doing the x265 thing with a single huge file
[19:37:28 CEST] <atomnuker> (they do that with intrinsics too)
[19:37:59 CEST] <atomnuker> and then we get blamed for a bloated ffvp9
[20:06:01 CEST] <iive> atomnuker: i wonder if somebody have tried C++ templetes and intrinsics...
[20:07:00 CEST] <BBB> atomnuker: ffvp9 isnt bloated
[20:07:05 CEST] <BBB> atomnuker: libvpx is much, much bigger
[20:07:16 CEST] <BBB> atomnuker: but libvpx got in chrome before limits were imposed, ffvp9 was proposed after
[20:07:21 CEST] <BBB> so we got stuck in traffic, basically
[20:07:27 CEST] <BBB> unfortunate, but thats life
[20:07:30 CEST] <BBB> at least firefox uses ffvp9
[20:09:39 CEST] <TD-Linux> and it's wonderful
[20:10:19 CEST] <BBB> :)
[20:10:57 CEST] <TD-Linux> we're finally adding 10 bit planes so the high bitdepth path should get some exercise soon too
[20:11:27 CEST] <JEEB> najs
[20:13:03 CEST] <Gramner> BBB: floating point calling conventions are a huge mess, so the x86inc approach to that is basically "screw that, you're on your own. have fun"
[20:13:25 CEST] <BBB> TD-Linux: oh thats super-cool
[20:21:59 CEST] <atomnuker> Gramner: its not that bad, just an ifdef to get floats in a register, the unix64 way
[20:22:24 CEST] <Gramner> on unix64 it's quite sane
[20:22:45 CEST] <atomnuker> yes, someone was thinking
[20:23:02 CEST] <atomnuker> "why would someone want to give float arguments and what would they use them for?"
[20:23:14 CEST] <atomnuker> SCALING FACTORS!
[20:23:40 CEST] <atomnuker> so its just a splat away from done
[20:24:24 CEST] <atomnuker> somehow doing extra work just to put them in gprs would be wasteful
[20:24:48 CEST] <Gramner> win64 is fun when you can only have 4 args in regs, and floats makes some int registers become unused etc
[20:24:49 CEST] <atomnuker> come to think of it, it would be nice to do this for everything
[20:25:10 CEST] <atomnuker> so mm regs would mirror gprs
[20:26:10 CEST] <atomnuker> though modern CPUs already zero them out for free I think
[20:29:43 CEST] <Gramner> it's just that making cglobal accept a mixture of int and fp and just make it do "the right thing" (tm) in all cases on all operating systems is non-trivial and would complicate the cglobal syntax as well (since you'd need to be able to specify a type for each arg)
[20:31:40 CEST] <atomnuker> yep
[20:32:27 CEST] <nevcairiel> manually handling the fp args for the couple functions you need it in is probably better then making x86inc be evil
[20:32:34 CEST] <Gramner> indeed
[20:32:41 CEST] <nevcairiel> to make  it easier on yourself, just put them last in the arg list
[20:55:32 CEST] <iive> while talking about asm arguments
[20:56:11 CEST] <iive> if the function is 64 bit and is having int32 arguments, is it supposed to clear the high 32 bits? e.g. moszxifidn ?
[20:58:51 CEST] <Gramner> if you need to use the 32-bit argument in a 64-bit operation, e.g. by adding it to a pointer, you need to zero out the upper 32-bit yourself since those bits are undefined by the ABI
[21:00:29 CEST] <Gramner> note that any 32-bit operation will implicitly zero the upper 32-bits
[21:01:09 CEST] <Gramner> or you can just change the type of that value to ptrdiff_
[21:01:26 CEST] <Gramner> ptrdiff_t*
[21:13:43 CEST] <iive> thanks
[21:16:20 CEST] <iive> they are counters so it is easy to use them in 32bit mode.
[21:17:36 CEST] <Shiz> a/w 25
[21:39:49 CEST] <J_Darnley> :) I see users are starting to wear Moritz down on the user mailing list.
[22:25:06 CEST] <BBB> J_Darnley: want to review now?
[22:25:36 CEST] <atomnuker> the asm tutorial?
[22:27:26 CEST] <durandal_1707> who needs that?
[22:31:17 CEST] <BBB> atomnuker: yes
[22:34:31 CEST] <atomnuker> post it somewhere
[23:01:51 CEST] <BBB> atomnuker: https://blogs.gnome.org/rbultje/2017/07/14/writing-x86-simd-using-x86inc-asm/
[23:02:21 CEST] <TD-Linux> BBB, nice
[23:02:38 CEST] <BBB> hope its useful
[23:02:53 CEST] <BBB> Ill copy it to ffmpegs wiki at some point in the future
[23:03:20 CEST] <ubitux> now link it in doc/optimization.txt
[23:03:34 CEST] <ubitux> (or maintain a text version in that file)
[23:04:29 CEST] <BBB> Ill link it :)
[23:08:12 CEST] <Gramner> BBB: your "void" sad_16x16_c example is returning a value
[23:10:45 CEST] <BBB> fixed
[23:11:30 CEST] <Gramner> you could perhaps also mention zmm regs where you mention mm-ymm regs
[23:12:36 CEST] <BBB> INIT_ZMM avx512?
[23:12:43 CEST] <Gramner> yes
[23:13:06 CEST] <atomnuker> BBB: "to xmm0and m1 might" <- lack of space
[23:13:11 CEST] <Gramner> things are basically how you'd expect them to be
[23:13:33 CEST] <atomnuker> I think this should be on trac as well
[23:13:44 CEST] <BBB> fixed
[23:13:56 CEST] <BBB> atomnuker: yes, Im intending to move it to the wiki (trac=wiki)
[23:14:06 CEST] <Gramner> some behind-the-scenes register renaming is done behind in avx512 code but that might be overkill for a tutorial
[23:14:22 CEST] <atomnuker> cool
[23:14:49 CEST] <BBB> hopefully also useful for future gsoc students
[23:15:57 CEST] <Gramner> something about auto vzeroupper maybe?
[23:17:13 CEST] <BBB> oh right, good point
[23:17:17 CEST] <BBB> in the RET section
[23:17:53 CEST] <atomnuker> also a sentence or two on float argument differences across different platforms?
[23:19:08 CEST] <BBB> Gramner: added
[23:19:29 CEST] <BBB> atomnuker: right, so thats what Id need help with, because Im just not familiar enough with float to say anything useful about it :-/
[23:19:41 CEST] <BBB> atomnuker: I was thinking of adding that in the wiki version (i.e. telling you to add it :-p)
[23:19:52 CEST] <atomnuker> yep, that'll be easier
[23:25:26 CEST] <Gramner> qXXXX notation for imm8 shuffle indices is used but not explained
[23:26:11 CEST] <atomnuker> (that notation rocks btw)
[23:26:57 CEST] <Gramner> yeah, using anything else is so hard to mentally parse
[23:28:16 CEST] <BBB> hm, yes, qNNNN
[23:28:47 CEST] <Gramner> it's just base4, but base4 happens to be super useful
[23:30:38 CEST] <BBB> "; qNNNN is a base4-notation for imm8 arguments
[23:30:44 CEST] <BBB> (as documentation int he relevant code segment)
[23:30:55 CEST] <iive> i thought it is nasm/yasm construct.
[23:31:05 CEST] <iive> sadly, it is not.
[23:31:28 CEST] <Gramner> it's way more useful than octal which they do have
[23:31:42 CEST] <BBB> should put it on hackernews now :-p
[23:32:17 CEST] <iive> aren't there some octal imm in ymm avx2 land?
[23:33:11 CEST] <BBB> https://news.ycombinator.com/item?id=14773368 < go vote! :-p
[23:35:57 CEST] <Gramner> "On win64, if this number is larger than 6, well be using callee-save xmm registers" unless it's an avx-512 function, in which case the number of volatile registers available is 22 (nit!)
[23:36:07 CEST] <Gramner> one of the more useful features of avx-512 on win64, really
[23:37:06 CEST] <Gramner> also that is not documented anywhere from microsofts side the last time I looked
[23:37:14 CEST] <BBB> ugh :D
[23:37:16 CEST] <Gramner> I had to get intel to ask their MS reps
[23:37:25 CEST] <Gramner> to confirm calling convention stuff
[23:37:38 CEST] <BBB> 6 xmm or 22 zmm caller-save?
[23:37:53 CEST] <BBB> or, in other words, 10 callee-save
[23:38:02 CEST] <Gramner> applies to any register (with avx-512VL)
[23:38:09 CEST] <Gramner> you have xmm16-xmm31 too
[23:38:17 CEST] <BBB> !!!!!!!!!!!!!!!!!!!!!!!1111111one
[23:38:18 CEST] <Gramner> any register size*
[23:38:24 CEST] <BBB> why did I not know that
[23:38:25 CEST] <BtbN> I'm still sad Ryzen doesn't have avx-512
[23:38:25 CEST] <BBB> that is insane
[23:38:30 CEST] <BBB> when can I use them?
[23:38:35 CEST] <BtbN> but probably a bit much to ask, as it doesn't even have real avx2
[23:38:36 CEST] <BBB> I want 32 xmm registers!!!!!!!!!!
[23:38:42 CEST] <nevcairiel> BtbN: ryzen doesnt even have real avx2, so yeah
[23:38:47 CEST] <Gramner> on any 64-bit systemn. also vzeroupper is not needeed for regs 16-31
[23:38:49 CEST] <nevcairiel> also, snap =p
[23:39:00 CEST] <Gramner> so what x86inc does it rename regs so that 16-31 are used first
[23:39:13 CEST] <nevcairiel> BBB: just invest into a brand new cpu =p
[23:39:19 CEST] <nevcairiel> (no macs)
[23:39:32 CEST] <BBB> fixed
[23:39:34 CEST] <Gramner> so vzeroupper is basically only used when you need more than 16 regs on avx512 x64 functions
[23:39:35 CEST] <BtbN> The new Skylake-SP CPUs are all horrible sadly
[23:39:51 CEST] <BtbN> not SP, X
[23:40:03 CEST] <atomnuker> they are?
[23:40:03 CEST] <BtbN> The Xeons are fine
[23:40:09 CEST] <BtbN> They have a massive heat issue
[23:40:17 CEST] <nevcairiel> the "horribleness" is largely overrated and overreported
[23:40:21 CEST] <nevcairiel> mien runs perfectly fine
[23:40:39 CEST] <Gramner> xeons have massive avx downclocking though
[23:40:43 CEST] <BtbN> from what I heard, the package is unable to spread the long-term 100% load heat
[23:40:52 CEST] <Gramner> up to like halving the clock frequency in some cases
[23:40:57 CEST] <atomnuker> from what I've been reading if I had to get a CPU I'd get a ryzen
[23:41:15 CEST] <BtbN> yes, Ryzen seems like the way more appealing consumer CPU
[23:41:22 CEST] <BtbN> Than anything Intel has right now
[23:41:54 CEST] <nevcairiel> I have a 7900X OCed to 4.5GHz and its fine, even in torture testing like Prime95. You do need a proper competent cooler, but shrug. Only in AVX512 I configuerd some downclocking because that does get terribly hot
[23:42:17 CEST] <BtbN> It's funny how avx2 on ryzen isn't even that much slower, as it does not clock down at all
[23:42:42 CEST] <nevcairiel> The reporting of those  "overclocking is impossible due to heat" things is hilarios though, considering you ltierally cannot OC ryzen much at all since it just stops working
[23:42:52 CEST] <TD-Linux> I was surprised to see cinebench tied between the new 16 core threadripper and 22 core broadwell-ep
[23:42:59 CEST] <TD-Linux> then again I think that benchmark has no AVX2
[23:43:08 CEST] <nevcairiel> cinebench doesnt use avx, indeed
[23:43:45 CEST] <Gramner> SKL-X mobos seem to favor ignoring TDP as much as possible over throttling. which does make sense though since you usually don't buy a HEDT system for perf/watt reason
[23:44:07 CEST] <TD-Linux> at least they managed to do their avx2 "emulation" in the smart way that doesn't introduce extra latency
[23:46:17 CEST] <Gramner> water cooling is sort of mandatory for SKL-X I think, air doesn't cut it
[23:46:39 CEST] <nevcairiel> Probably. I have one of those AIO/CLC coolers
[23:46:54 CEST] <nevcairiel> but threadripper wont be any different, it comes witha 180W TDP out of the box
[23:47:10 CEST] <Gramner> AIO is fine for normal usage. for high-end OC you probably want a custom loop
[23:47:19 CEST] <nevcairiel> granted, you likely wont be able to OC it, since like Ryzen it'll likely just not cross 4GHz anyway
[23:47:50 CEST] <Gramner> threadripper is spread over a larger surface area due to it being 2 separate dies which might be easier to cool. it's also soldered
[23:47:58 CEST] <Gramner> SKL-X uses TIM
[23:48:27 CEST] <atomnuker> nevcairiel: what difference does OC make in say, h264 decoding speed?
[23:50:53 CEST] <nevcairiel> clock speed scales pretty well (almost linear) with most workloads
[23:51:43 CEST] <nevcairiel> so between 4 and 4.5ghz, you could probably assume close to 10% improvement as well
[23:52:08 CEST] <nevcairiel> not sure how well h264dec scales with so many cores though, didnt really test that much
[23:52:31 CEST] <jamrial> BBB: maybe mention why intrinsics carry a performance penalty
[23:52:44 CEST] <BBB> hm& dunno
[23:52:46 CEST] <jamrial> otherwise it reads like "it's bad because reasons, use this other thing instead"
[23:52:55 CEST] <jamrial> you did mention what inline asm is bad
[23:53:12 CEST] <BBB> if people want to use intrinsics, theyll use it
[23:53:18 CEST] <BBB> see libvpx/aom
[23:53:24 CEST] <jamrial> just say it depends on the compiler not being stupid
[23:53:31 CEST] <nevcairiel> could still meantion the reasons why we dont like them
[23:53:33 CEST] <nevcairiel> -a
[23:53:43 CEST] <nevcairiel> even if it doesnt change peoples minds
[23:54:32 CEST] <BBB> jamrial: ok
[23:55:20 CEST] <BBB> done
[23:56:23 CEST] <Gramner> xmN could need an explanation
[23:57:07 CEST] <BBB> good point
[23:57:11 CEST] <BBB> not sure where to introduce it though
[23:57:22 CEST] <BBB> we may just add that later in the wiki version
[23:58:48 CEST] <Gramner> several instances of "mm0, xmm0 or ymm0" should have zmm added
[23:59:25 CEST] <iive> does x86asm support zmm ?
[23:59:29 CEST] <Gramner> overall a good writeup
[23:59:49 CEST] <Gramner> the version in x264 does, I don't think it got commited to ffmpeg yet
[00:00:00 CEST] --- Sat Jul 15 2017