[Ffmpeg-devel-irc] ffmpeg-devel.log.20161018

Wed Oct 19 03:05:02 EEST 2016

[00:30:00 CEST] <cone-441> ffmpeg 03Carl Eugen Hoyos 07master:a20f3238be93: lavf/avidec: Do not fail for very large idx1 tags.
[01:13:46 CEST] <atomnuker> jamrial_: to convert e.g. abs_pow34 to avx2 I just need to do "vinsertf128 m0, m0, [inq+sizeq], 1" to load the upper part of the register, right?
[01:24:42 CEST] <atomnuker> (also would cvtsi2ss  m3, dword maxvalm work on non-windows non-unix64 systems?)
[01:27:34 CEST] <jamrial_> atomnuker: no, movaps with ymm registers will load 32 bytes instead of 16. vinsertf128 is when you want to duplicate 16 bytes in a ymm register
[01:28:35 CEST] <jamrial_> and regarding cvtsi2ss, only unix64 has maxval already in a gpr regardless of how you init the function. x86_32 (any os) and win64 have it on stack
[01:29:56 CEST] <jamrial_> so to answer the question, yes :p
[01:34:14 CEST] <atomnuker> why does every single avx2 asm use vinsertf128 then?
[01:34:34 CEST] <atomnuker> I thought it was some magic to load the upper part of the register
[01:36:51 CEST] <jamrial_> it's used to insert 16 bytes of data in the upper half of the register, but it's not the only way to achieve that
[01:39:08 CEST] <jamrial_> "movaps ymm0, [mem]" loads 32 bytes from memory, "movaps ymm0, ymm1" moves all 32 bytes from ymm1 to ymm0, etc
[01:39:11 CEST] <atomnuker> so movaps will load 32 bytes but e.g. movu/mova won't?
[01:39:44 CEST] <jamrial_> mova/movu are macros that expand to mov[au]ps or movdq[au]
[01:40:25 CEST] <jamrial_> all of them can load 32 bytes if you use ymm regs
[01:43:31 CEST] <atomnuker> well, if all I had to do was INIT_YMM avx2 then I think maybe something else isn't aligned since I segfault
[01:44:16 CEST] <jamrial_> probably the in and out buffers. they should be 32 bytes aligned, and padded
[01:45:25 CEST] <jamrial_> also, the splatd/shufps stuff wont work just like that with ymm regs. for those you'll probably need to use vinserft128 to fill the upper 16 bytes
[01:46:43 CEST] <jamrial_> or vbroadcastss
[01:50:17 CEST] <atomnuker> as for fixing the ARM build, just putting if (ARCH_X86) ff_aac_dsp_init_x86() should work, shouldn't it?
[01:50:27 CEST] <atomnuker> it's what aacpsdsp does
[01:55:55 CEST] <jamrial_> atomnuker: yeah
[02:02:32 CEST] <atomnuker> disappointing, no real performance improvements switching abs_pow34 to avx2
[02:07:58 CEST] <jamrial_> is the loop really processing eight floats at a time instead of the four from the sse/sse2 version?
[03:39:28 CEST] <cone-134> ffmpeg 03Zhao Zhili 07master:7853d838a6e4: avformat/tests/gitignore: add fifo_muxer entry
[04:16:36 CEST] <kierank> 1:02 AM <"atomnuker> disappointing, no real performance improvements switching abs_pow34 to avx2
[04:16:40 CEST] <kierank> avx2 is integer, no
[04:16:45 CEST] <kierank> isn't it just normal AVX
[04:35:36 CEST] <Zeranoe> It looks like '-Wl,--image-base,0x140000000' is causing a compiler error with GCC 6.2.0 when snappy is included. This seems specific to 64-bit. It looks like that ldflag was introduced with a58c22d61260941fc651add73836882d5b112fdb
[04:38:37 CEST] <rcombs> Zeranoe: snappy?
[04:38:45 CEST] <rcombs> and what error?
[04:39:22 CEST] <Zeranoe> rcombs: libstdc++-v3/src/c++11/cow-stdexcept.cc:236:(.text$_Z35_txnal_cow_string_C1_for_exceptionsPvPKcS_+0x2c): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `_ITM_RU1'
[04:40:03 CEST] <Zeranoe> rcombs: https://github.com/google/snappy
[10:06:11 CEST] <cone-762> ffmpeg 03Muhammad Faiz 07master:2c1be03cb38f: fate: add swr-convertaudio test
[15:02:19 CEST] <cone-095> ffmpeg 03Carl Eugen Hoyos 07master:31a0a8421658: lavf/avidec: Be more verbose when ignoring very large tag size.
[15:21:34 CEST] <cone-095> ffmpeg 03Michael Niedermayer 07master:2bd99564540a: doc/examples/demuxing_decoding: Drop AVFrame->pts use
[15:32:44 CEST] <mateo`> michaelni: should I push your mediacodec patch ? Or should I ?
[15:38:14 CEST] <michaelni> mateo`, sure feel free to push, its still on my todo but i half forgotten
[15:41:35 CEST] <cone-095> ffmpeg 03Michael Niedermayer 07master:9545ff3ec391: avcodec/mediacodec: Factor duplicate include
[16:26:02 CEST] <wm4> ffmpeg demuxers now output broken DTS timestamps
[16:26:11 CEST] <wm4> because of that shitty google edit list patch
[16:26:14 CEST] <wm4> ...
[16:26:35 CEST] <wm4> I guess I'm supposed to fix them somehow, because the finder of a bug gets to keep the bug
[16:32:04 CEST] <nevcairiel> for mov only then, i assume?
[16:35:39 CEST] <wm4> lol the patch adding AV_PKT_FLAG_DISCARD didn't even bump any library versions
[16:36:23 CEST] <wm4> only the AVFrame change has a bump
[16:37:11 CEST] <wm4> god, what an idiotic patch
[16:37:25 CEST] <wm4> looking forward to breaking it shit all over the place
[17:38:49 CEST] <cone-095> ffmpeg 03Muhammad Faiz 07master:acd74f92009d: swresample/resample: fix return value of build_filter
[17:44:12 CEST] <cone-095> ffmpeg 03Muhammad Faiz 07master:d3be186ed1bc: avfilter/firequalizer: add dumpfile and dumpscale option
[17:56:10 CEST] <atomnuker> jamrial: what's wrong with if (ARCH_X86) ff_aac_dsp_init_x86(s);?
[18:10:01 CEST] <nevcairiel> That should be fine, assuming the prototype always exists as well
[18:16:12 CEST] <atomnuker> it's the way it's done in every init function out there so it should be fine
[18:23:52 CEST] <BtbN> Isn't stuff like that the reason ffmpeg needs at least O1?
[18:24:02 CEST] <BtbN> So the symbol gets optimized out
[18:25:24 CEST] <atomnuker> yep
[18:27:30 CEST] <jamrial> atomnuker: nothing wrong with it. why do you ask?
[18:29:03 CEST] <atomnuker> you put a "[...]" below that change
[18:32:41 CEST] <kierank> that means [snip]
[18:34:46 CEST] <atomnuker> makes sense now
[18:47:10 CEST] <Chloe> timestamps are broken?
[18:48:59 CEST] <Chloe> ugh. libav changing file names of hevc asm
[19:18:28 CEST] <Chloe> https://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavcodec/utils.c;h=b0345b63af3d9;hb=b0345b63af3d940#l2323 This comment about demuxer_skip_samples sucks, why should the discard frame flag, ignore skip_samples set by the decoder?
[19:20:40 CEST] <ubitux> "comment"?
[19:20:48 CEST] <ubitux> the only comment i see is related to pkt copy
[19:21:17 CEST] <Chloe> line 2358
[19:21:32 CEST] <Chloe> I just set it to start on the first relevant line
[19:27:59 CEST] <Chloe> does anyone actually know how the edit list patches work
[20:16:17 CEST] <BBB> wbs: ping
[20:21:05 CEST] <wbs> BBB: pong
[20:21:30 CEST] <BBB> wbs: how large are the aarch64 registers? 16byte (like sse) or 32byte (like avx)?
[20:21:57 CEST] <wbs> BBB: 16 byte registers, 32 such registers
[20:22:04 CEST] <BBB> yuck :(
[20:22:05 CEST] <BBB> ok
[20:22:41 CEST] <BBB> thats the opposite of the avx problem (32byte registers, but only 16 of em)
[20:23:00 CEST] <BBB> the simd numbers look really good, nice job
[20:23:33 CEST] <wbs> it's a step up from 32 bit arm though, where you have 16 x 16 bytes registers, or 32 x 8 (you can treat them interchangably like either of those)
[20:23:54 CEST] <BBB> hm...
[20:24:03 CEST] <BBB> so can you treat them like 16 32-byte registers on aarch64?
[20:24:29 CEST] <wbs> no
[20:24:58 CEST] <BBB> <- arm n00b
[20:25:46 CEST] <cone-095> ffmpeg 03Jon Toohill 07master:81f4f789de7c: lavc/libmp3lame: send encoder delay/padding in packet side data
[20:25:47 CEST] <ubitux> using them as 32x8 can be a bitch though 
[20:25:47 CEST] <cone-095> ffmpeg 03Jon Toohill 07master:3b02f6dd7be8: lavf/mp3enc: write encoder delay/padding upon closing
[20:25:56 CEST] <ubitux> because the high part is not as accessible
[20:26:11 CEST] <ubitux> it's often use 8 lower or all 16
[20:26:26 CEST] <wbs> ubitux: no, I meant on 32 bit arm, where you have d0-d31
[20:26:42 CEST] <ubitux> yeah i was talking about aarch64
[20:26:53 CEST] <wbs> on aarch64, using the high half as a separate register (so you'd have 64 registers) isn't really an option no
[20:27:11 CEST] <wbs> or it's kinda inconvenient at least
[20:28:18 CEST] <ubitux> btw, there are a bunch of places in your macro where you could pass the suffix in the param
[20:28:30 CEST] <ubitux> to avoid the annoying \()
[20:29:07 CEST] <wbs> can you point out which one? in some cases I've chosen that to keep the macro invocation a bit more readable, but I'm wildly inconsistent about it
[20:29:34 CEST] <ubitux> my eyes were on the idct
[20:29:52 CEST] <ubitux> dunno if you did it elsewhere
[20:30:16 CEST] <wbs> right; those are mostly intentional
[20:30:19 CEST] <philipl> BtbN: what's next for dynlink cuda?
[20:30:35 CEST] <BtbN> sending it to the ML.
[20:31:00 CEST] <wbs> because the idct/iadst transforms are absolutely mindnumbing anyway, even without the extra boilerplate of those suffixes there; I much rather keep the macros a bit bulkier
[20:31:51 CEST] <wbs> BBB: anyway, once the patches are merged, I'll post full benchmarks of pre/post speedups and such. I still occasionally get a few fps more when finding more things to optimize
[20:32:23 CEST] <BBB> cool
[20:32:44 CEST] <BBB> these new aarch64 devices are all multicore, right?
[20:33:01 CEST] <wbs> yes, most are quad at least
[20:33:04 CEST] <wbs> my phone has got 8 cores
[20:33:22 CEST] <wbs> (4 low power ones and 4 high power ones, but all of them have got simd)
[20:33:41 CEST] <wbs> iOS devices usually have got much fewer cores, but higher performance per core
[20:34:16 CEST] <BBB> 4, nice
[20:34:34 CEST] <wbs> yup, it's super nice for things like this
[20:35:58 CEST] <wbs> (although the phone clocks down pretty soon if you push it too much)
[20:36:20 CEST] <BBB> hm :( thats a little sad
[20:36:37 CEST] <BBB> but even then, 4 cores, with simd should probably give you 1080p realtime right?
[20:36:54 CEST] <wbs> earlier I've benched 208 fps for sintel, 93 fps for tos and 50 fps for etv, out of your sample collection
[20:37:15 CEST] <ubitux> i'd assume most of these new aarch64 devices have a vp9 hw decoder anyway
[20:37:27 CEST] <ubitux> which is likely much more interesting to use most of the time
[20:37:33 CEST] <ubitux> i wonder where it would be useful
[20:39:14 CEST] <wbs> ubitux: at least my phone only have got hevc, no vp9
[20:39:34 CEST] <ubitux> ah? interesting; mine seems to have vp9
[20:39:45 CEST] <ubitux> that's some pretty decent speed btw
[20:40:24 CEST] <wbs> yeah, if the decoder itself can decode on like 100 fps, it probably won't get insanely hot when decoding realtime, and probably won't clock down all too much
[20:41:37 CEST] <wbs> hmm, my phone has got a snapdragon 810, and at least wikipedia claims it has got even vp9 encoding (so I'd assume decoding as well). but perhaps sony didn't care to include the driver for that
[20:43:32 CEST] <BBB> 50 for etv is pretty good, that clip is pretty hard and high bitrate
[20:44:22 CEST] <wbs> yeah. and let's see what numbers I get once I get it merged. at least for loop filter (which affects sintel the most) I think I've got a bit of more speedup coming
[20:44:36 CEST] <wbs> (but I don't benchmark on the phone all too often because it's a bit more annoying than directly on a devboard)
[21:33:55 CEST] <mateo`> ubitux: software decoding is always interesting, testing, fallback in case the hw api is busy with other things
[22:33:19 CEST] <Zeranoe> Seems pretty quiet on the Snappy project end... 
[22:33:32 CEST] <Zeranoe> But hopefully they have something to say https://groups.google.com/forum/#!topic/snappy-compression/4QzFeWFaTlU
[22:35:48 CEST] <BtbN> Well, there's Zstandard now.
[22:40:40 CEST] <Zeranoe> Are there any usage statistics for FFmpeg versions? I'm considering building releases from different release braches (3.1, 3.0), but I'm curious if there's even a demand
[22:41:34 CEST] <cone-095> ffmpeg 03Rostislav Pehlivanov 07master:d2ae5f77c61a: aacenc: add SIMD optimizations for abs_pow34 and quantization
[23:16:11 CEST] <Compn> Zeranoe : for what kind of user ? e.g. anyone using a release is just using that release because of distro package
[23:16:23 CEST] <Compn> anyone using git is because of using git or one of your nightly builds
[23:25:39 CEST] <cone-095> ffmpeg 03Michael Niedermayer 07release/3.0:5771a0c8237d: doc/examples/demuxing_decoding: Drop AVFrame->pts use
[00:00:00 CEST] --- Wed Oct 19 2016