[Ffmpeg-devel-irc] ffmpeg-devel.log.20190208
burek
burek021 at gmail.com
Sat Feb 9 03:05:03 EET 2019
[00:15:30 CET] <cone-642> ffmpeg 03Marton Balint 07master:28dd73a4437a: ffplay: use different decoder names for each media type
[00:15:30 CET] <cone-642> ffmpeg 03Marton Balint 07master:fe99a51c40d5: ffplay: add missing avfilter_graph_alloc result check
[00:15:30 CET] <cone-642> ffmpeg 03Marton Balint 07master:7cab5471b231: ffplay: add support for setting the number of filter threads
[08:15:22 CET] <akravchenko188> jkqxz: hi. it would be very good to speedup amf patches review and submit. to archive this we could send couple of new AMD cards to members of community who tests such patches. could you please help me with this?
[08:59:03 CET] <JEEB> michaelni: I'm sorry for bothering, but I will have to ask you to review the libaribb24 thread to see if I could have done things better there. http://ffmpeg.org/pipermail/ffmpeg-devel/2019-February/239629.html
[10:03:51 CET] <j-b> Sorry, going to disagree on homebrew
[11:19:53 CET] <perseiver> can anyone help here regarding ffmpeg, converting video to h.264?
[12:21:12 CET] <cone-143> ffmpeg 03chcunningham 07master:3ea87e5d9ea0: avformat/mov.c: require tfhd to begin parsing trun
[12:21:12 CET] <cone-143> ffmpeg 03chcunningham 07master:1c15449ca9a5: avformat/mov: validate chunk_count vs stsc_data
[12:21:12 CET] <cone-143> ffmpeg 03Decai Lin 07master:9d800d39d557: avcodec/h264_parse: no need check ref list1 for P slices.
[12:21:12 CET] <cone-143> ffmpeg 03Michael Niedermayer 07master:7f8bfbee3663: avcodec/h264_parse: Clear ref_list[1] if only [0] is used
[13:34:41 CET] <j-b> Could you stop committing things like this?
[13:35:29 CET] <j-b> his name is "Chris Cunningham", not "chcunningham"
[13:47:10 CET] <durandal_1707> michaelni: you evil
[13:51:49 CET] <michaelni> j-b, sorry, that was what was in the submitted patch. Ill ask him fix this for future patches
[13:56:37 CET] <j-b> Guys, you keep doing that over and over and over
[13:56:59 CET] <j-b> When someone wants to use a pseudonym, sure, but here it is just "I did not set my git correctly"
[13:57:07 CET] <j-b> it's very hard for copyright and so on
[13:57:22 CET] <j-b> (see the vp9 or prores fight, why copyright is important)
[14:27:17 CET] <atomnuker> vp9 fight?
[14:27:39 CET] <durandal_1707> viper9
[14:30:51 CET] <gnafu> Was ProRes when The King broke the silence on his faked death, or am I thinking of another instance of misattribution?
[14:46:41 CET] <durandal_1707> gnafu: king is dead, fake author was pushed to the tree
[14:55:38 CET] <atomnuker> there are many kings, but only one's the king of pop
[14:55:49 CET] <atomnuker> or was it rock?
[15:35:25 CET] <gnafu> atomnuker: I'm referring to the king of rock 'n' roll. The king of pop is Michael Jackson, as I understand it.
[17:54:09 CET] <BtbN> Is that a bot or something?
[17:55:43 CET] <jdarnley> maybe a poor, unfortunate intern
[19:23:34 CET] <jdarnley> Thankfully ffmpeg has its own robo/intern
[21:50:33 CET] <KungFuJesus> you guys need to mangle names on this function or there's a symbol conflict: https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/cuda_check.c
[21:50:51 CET] <KungFuJesus> libavutil and libavcodec both are exporting ff_cuda_check
[21:51:37 CET] <jamrial> philipl: ^
[21:52:22 CET] <KungFuJesus> or make one lib call into the other, though I'm guessing you are intentionally not doing that so that they can be separate SOs
[21:52:55 CET] <JEEB> KungFuJesus: seems like the semi-correct person got highlighted :)
[21:53:46 CET] <KungFuJesus> cool, glad I could help
[21:57:22 CET] <BtbN> Sony Vegas Pro support forums at its best: "Except for uncompressed video there is no such thing as "lossless". All codecs are lossy, but I think what you are looking for is to encode with a setting that will have the same quality as your source encode."
[21:57:59 CET] <BtbN> That guy is talking so much bullshit in that thread, wow
[21:58:01 CET] <JEEB> :D
[21:58:14 CET] <JEEB> yes, because enterprise only uses raw as lossless
[21:58:24 CET] <JEEB> I have literally received masters that are raw RGB in mov
[21:58:31 CET] <JEEB> when I requested lossless
[21:58:49 CET] <BtbN> Guess I will not pay for Vegas, as it can't even do lossless h264
[21:59:07 CET] <JEEB> a lot of people utilize umezawa's Ut Video package for lossless intermediates
[21:59:22 CET] <JEEB> since it has video for windows, directshow, media foundation and QT modules
[21:59:30 CET] <BtbN> If only OpenShot would be more stable :/
[22:00:06 CET] <BtbN> I don't need a lot of functionality really, and in theory could do it with ffmpeg.c, but stinger transitions via CLI are a pain
[22:00:26 CET] <BtbN> Might still end up scripting them
[22:00:56 CET] <gnafu> BtbN: Are you referring to the newer 2.x series, or the older 1.4x one?
[22:01:17 CET] <gnafu> Granted, I haven't used either extensively, but my impression is that the newer 2.x OpenShot is much more stable.
[22:01:26 CET] <gnafu> I've also had good experiences with Shotcut.
[22:03:46 CET] <BtbN> Tried whatever is the latest on their website
[22:03:59 CET] <BtbN> And within minutes of playing around with it I had uninteractible things stuck in my timeline
[22:04:47 CET] <gnafu> Aah.
[22:06:43 CET] <KungFuJesus> yikes, sandybridge-e CPU is struggling with dav1d with Netflix's sample chimera video
[22:07:05 CET] <JEEB> KungFuJesus: I think that's AVX1 only, right?
[22:07:09 CET] <KungFuJesus> the toddler fountain footage drops frames like mad :-p
[22:07:24 CET] <KungFuJesus> yeah, so no FMA and only SSE level 4-wide SIMD
[22:09:00 CET] <KungFuJesus> I'm tempted to try to accelerate some of the functions myself, but my experience with SIMD is usually through intrinsics. It looks like ffmpeg prefers raw nasm asm
[22:09:18 CET] <JEEB> raw asm with the x264 wrapper that makes defining functions simpler
[22:09:27 CET] <JEEB> even I was able to do a left predict!
[22:09:30 CET] <KungFuJesus> GCC's register allocator when playing with intrinsics, at least in my experience, isn't half bad. But perhaps people are better at tuning with nasm
[22:09:32 CET] <JEEB> and I'm dumb in SIMD!
[22:10:29 CET] <BtbN> Hm, we do not seem to have a way to not export a non-static symbol, or am I missing something?
[22:10:55 CET] <JEEB> KungFuJesus: also dav1d technically is a separate project outside of FFmpeg, but the style is the same
[22:11:01 CET] <jamrial> KungFuJesus: dav1d is currently fully optimized with avx2 simd. ssse3 simd is still in the works
[22:11:08 CET] <KungFuJesus> I suppose I could write something in intrinsics, post the resulting assembly for someone to translate into the dialect NASM expects
[22:11:10 CET] <jamrial> 8bits only in both cases
[22:11:12 CET] <JEEB> manual assembler and most likely using the same x264 framework :)
[22:11:15 CET] <jamrial> 10bit comes later
[22:11:30 CET] <gnafu> KungFuJesus: Check out #dav1d. There's definitely need for more SSSE3 SIMD.
[22:11:36 CET] <JEEB> KungFuJesus: example https://github.com/jeeb/libav/blob/utvideo-simd/libavcodec/x86/utvideodsp.asm (doesn't work beyond first line but that should be enough
[22:11:50 CET] <JEEB> and yes, #dav1d
[22:11:51 CET] <JEEB> :)
[22:11:51 CET] <gnafu> For a list of things left to do: https://code.videolan.org/videolan/dav1d/issues/216
[22:11:58 CET] <KungFuJesus> I also have a G5 that I can author some altivec on, and an rpi3 for NEON/ASIMD
[22:13:23 CET] <gnafu> KungFuJesus: CDEF is a big one that doesn't have any SSSE3 SIMD yet. As someone whose main computers lack AVX2, I would be grateful for anything you'd like to contribute in that regard ;-).
[22:13:27 CET] <KungFuJesus> though I suspect the really poor memory bandwidth of the rpi3 will prevent any terribly useful accelerations (In my experience it's been really poor with fftw for work related stuff, SIMD or not)
[22:14:13 CET] <gnafu> That I couldn't say. Here's the tracker for NEON SIMD: https://code.videolan.org/videolan/dav1d/issues/215
[22:14:39 CET] <KungFuJesus> I'm sure I could produce something that's useful, it's just hard to measure improvements on that device due to memory bandwidth constraints being what most problems are bound by
[22:15:32 CET] <KungFuJesus> I'll have to run perf on the unstripped library to see exactly where the cycles are spent for me during that portion of the video, but I'll check out those links
[22:16:30 CET] <BtbN> philipl, do you remember if there's a particular reason ff_cuda_check isn't an inline-function in the header?
[22:34:01 CET] <KungFuJesus> looks like the vast majority of time is in cdef_filter_block_c for me
[22:34:14 CET] <KungFuJesus> the "_c" I'm guessing means it's a vanilla C implementation
[22:35:06 CET] <gnafu> That's my understanding, yes.
[22:36:18 CET] <KungFuJesus> hmm, I see at least one loop in this asm that looks unrolled. And some conditional move instructions
[22:36:36 CET] <KungFuJesus> actually the whole thing is basically cmov's, hah
[22:36:37 CET] <gnafu> I think CDEF is the next big thing on the chopping block for SSSE3 SIMD, and probably that one that will enable my home machines to do realtime HD decoding :-D.
[22:37:13 CET] <gnafu> BBB mentioned in #dav1d that translating from the AVX2 SIMD shouldn't be too difficult for someone who knows what they're looking at.
[22:37:18 CET] <KungFuJesus> if the condition they are moving on is very predictable, it might actually be better to force a branch and let the BPB take care of it
[22:37:32 CET] <gnafu> (If I understood him correctly.)
[22:37:44 CET] <BBB> I already wrote a xmm function for cdef_dir
[22:37:46 CET] <KungFuJesus> I tend to do a lot of AVX2 code and backport to SSE4/AVX at work
[22:37:48 CET] <BBB> cdef_filter isn't hard either
[22:38:50 CET] <BBB> but dav1d talk should be in #dav1d...
[22:39:05 CET] <BBB> as for the cmov, don't do that. for the C is written to be small and readable
[22:39:11 CET] <BBB> the simd is typically different and doesn't use branches
[22:39:18 CET] <BBB> check x86 simd to get an idea of what to do
[22:39:55 CET] <KungFuJesus> oh I'm aware of the branchless nature of whatever the SIMD implementation will be. I was just wondering if the scalar C version could be sped up by making GCC less eager to emit CMOVs
[22:48:28 CET] <KungFuJesus> BBB: it looks like majority of the work is happening in 128 bit wide registers, anyhow: https://code.videolan.org/videolan/dav1d/blob/master/src/x86/loopfilter.asm
[22:48:48 CET] <KungFuJesus> unless someone you guys are stripping off the vex encoded nomenclature for those instructions
[22:49:50 CET] <BBB> we are :)
[22:50:02 CET] <BBB> see x86inc.asm
[22:52:05 CET] <KungFuJesus> ah yes, pmadd, the fma before there were FMAs :)
[22:52:14 CET] <BBB> the basic idea (we don't necessarily use that in dav1d, but in general) is to be able to macro'ify functions and reuse them for xmm as well as ymm
[22:52:34 CET] <BBB> fma is float though
[22:52:53 CET] <KungFuJesus> right, but avx512 recently added integer FMAs, presumably for some DL stuff
[22:54:30 CET] <KungFuJesus> if only the vast majority of this stuff was floating point domain, AVX would have been sufficient. I suppose they have to be integer or fixed point to not deal with the non-sense of denormals, under and overflows
[22:54:31 CET] <BBB> ah right
[22:54:44 CET] <BBB> it's more about not drifting
[22:54:52 CET] <BBB> float drifts by nature because it's not bitexact
[22:55:00 CET] <BBB> int cna be exactly defined
[22:55:06 CET] <KungFuJesus> right, you blow out your precision with big numbers
[22:56:48 CET] <KungFuJesus> this would be much more legible in compiler instrinsics, though :-/. And possibly easier to wrap. I might look at this this weekend to see if I can translate it to SSE4 (looks like there's a pmin/pmax in there)
[23:08:01 CET] <BBB> readability of intrinsics is subjective
[23:08:06 CET] <BBB> you like it because you're familiar with it
[23:08:10 CET] <BBB> we don't because we're not
[23:08:20 CET] <BBB> ¯\_(Ä)_/¯
[23:08:34 CET] <BBB> it's like coding style, or non-asm language: python or go?
[23:09:00 CET] <BBB> it's the one nice thing about starting a new project, you cna just say "it shall be 4 spaces forever more!" and so it is
[23:11:36 CET] <atomnuker> yeah, didn't work that way with rav1e
[23:12:03 CET] <atomnuker> 2 spaces carried over from xiph/aom and 4 spaces introduced by sane people
[23:19:39 CET] <KungFuJesus> BBB: it's mostly easier because the layer of glue between instructions and the C interface is more obvious. You get a little less control as to what spills to stack and what stays in registers, but it's very easy to know what's loading into a vector register and from what address
[23:21:02 CET] <BBB> atomnuker: progress <3 b/c real programmers use 4 spaces
[23:21:21 CET] <BBB> KungFuJesus: don't worry, I'm very familiar with asm :)
[23:23:38 CET] <KungFuJesus> I'm not unfamiliar with it, but it's really convenient to not have to pay attention to when you need to spill to stack, or to track which vector registers you've used
[23:24:42 CET] <jamrial> that's what the magic of x86inc is for
[23:25:09 CET] <KungFuJesus> oh I wasn't aware it was handling register spills, that's at least one less thing to worry about
[23:28:48 CET] <jamrial> it handles prologue and epiloge, stack pointer, provides aliases so you can write functions using mmx, xmm and ymm registers in a single macro
[23:29:16 CET] <jamrial> translates three operand instructions (avx style) to mov + two operand instruction if needed, etc
[23:29:19 CET] <jamrial> it's pretty neat
[23:30:51 CET] <KungFuJesus> what is this madness in the RO section?
[23:30:53 CET] <KungFuJesus> pb_m1_2: times 16 db -1, 2
[23:32:24 CET] <jamrial> an array of 32 bytes, probably to be used as index for pshufb or similar
[23:34:06 CET] <KungFuJesus> that's not really clear that is a shuffle vec
[23:34:20 CET] <jdarnley> The one time I looked at using intrinsics, it made doing that ^^ impossible.
[23:34:52 CET] <atomnuker> its not the best name, yeah
[23:34:55 CET] <KungFuJesus> so does that mean, for instance, the transpose macro there is mostly avx safe, or does it lack the proper number of tranpositions for the matrix dimension to work?
[23:36:17 CET] <KungFuJesus> obviously those unpacks don't work, but you could just use the floating point versions and it'd work the same, albeit with maybe a slightly higher latency
[23:37:39 CET] <KungFuJesus> actually the floating point unpacks have the same latency and pipeline throughput sizes
[23:38:08 CET] <atomnuker> yeah, these days you can use whatever without penalty
[23:38:12 CET] <jamrial> KungFuJesus: join #dav1d, there are other devs there who wrote most of these functions and understand them best
[23:39:05 CET] <KungFuJesus> jdarnley: you can construct a shuffle vector with instrinsics quite easily
[23:40:05 CET] <KungFuJesus> just declare an __m256i as a static, the compiler will put it in the RO section of the binary in most circumstances
[23:44:23 CET] <KungFuJesus> most of the compiler intrinsics share a name with their instruction, I've never heard someone describe instrinsics as less legible than their raw assembly counterparts
[23:46:26 CET] <jdarnley> most? bollocks!
[23:50:31 CET] <BBB> ...
[23:50:35 CET] <BBB> blue!
[23:50:41 CET] <BBB> blue is better than yellow
[23:50:43 CET] <BBB> everyone knows it
[23:50:49 CET] <BBB> KungFuJesus: it's a pointless argument, really
[23:50:56 CET] <BBB> let's not have it
[23:51:02 CET] <BBB> everyone can program in their own favourite dialect
[23:51:09 CET] <BBB> ours is x86inc.asm-assisted nasm
[23:51:19 CET] <BBB> sorry if that's not yours
[23:51:54 CET] <jdarnley> :( please can we argue?
[23:52:44 CET] <BBB> sure
[23:52:47 CET] <BBB> I'm gonna go home though
[23:52:48 CET] <BBB> bbl
[23:54:19 CET] <BtbN> philipl, https://github.com/BtbN/FFmpeg/commit/91ea6f192181a5947e6b966defb753d3fdcd0056
[00:00:00 CET] --- Sat Feb 9 2019
More information about the Ffmpeg-devel-irc
mailing list