[Ffmpeg-devel-irc] ffmpeg-devel.log.20190730

Fri Aug 16 20:21:20 EEST 2019

[00:21:06 CEST] <cone-420> ffmpeg 03Andreas Rheinhardt 07master:4e7e30bbe0fb: cbs: Don't set AVBuffer's opaque
[00:21:07 CEST] <cone-420> ffmpeg 03Andreas Rheinhardt 07master:ae49993ce6e5: cbs_h264: Improve adding SEI messages
[00:21:08 CEST] <cone-420> ffmpeg 03Andreas Rheinhardt 07master:0e66e1b61ea2: cbs_mpeg2: Decompose Sequence End
[00:21:09 CEST] <cone-420> ffmpeg 03Andreas Rheinhardt 07master:276b21a58690: cbs_mpeg2: Rearrange start code search
[00:21:10 CEST] <cone-420> ffmpeg 03Andreas Rheinhardt 07master:fd93d5efe642: cbs_mpeg2: Fix parsing the last unit
[05:10:14 CEST] <Compnn> hehe
[05:10:29 CEST] <Compnn> bayer talk has me interested in this r3d bug again https://trac.ffmpeg.org/ticket/2690
[05:11:04 CEST] <Compnn> Lynne, who are you , welcome to ffmpeg development!
[05:35:25 CEST] <cone-318> ffmpeg 03James Almer 07master:502aff91a769: avformat/av1: fix AV1CodecConfigurationBox name in doxy
[06:38:56 CEST] <rcombs> clang diagnosed an issue in vf_thumbnail_cuda that was presumably missed by NVCC (operator precedence)
[09:25:36 CEST] <cone-673> ffmpeg 03Linjie Fu 07master:b3b7523feb5a: lavu/hwcontext_qsv: fix the memory leak
[09:41:55 CEST] <durandal_1707> nobody managed to test dsddec patch?
[10:00:41 CEST] <rcombs> alright my stuff works, patch sent
[10:05:38 CEST] <philipl> rcombs: amazing. What inspired you to do it?
[10:06:12 CEST] <rcombs> nvidia didn't respond within like an hour, I was bored, and it turned out to be easy
[10:06:27 CEST] <rcombs> ffmpeg really doesn't use very much of the API and the parts that it does use are trivial
[10:07:20 CEST] <rcombs> the hardest parts were texture sampling (which is a single instruction I just had to write a wrapper for) and a few special register accesses (blockIdx, blockDim, threadIdx)
[10:07:27 CEST] <rcombs> and as you can see from the code, none of those was complex either
[10:08:03 CEST] <rcombs> I was a little worried that the compiler might actually emit all the redundant ASM in the macros, but no it's smart enough to avoid that
[10:08:27 CEST] <philipl> Nice.
[10:08:34 CEST] <philipl> I totally support switching to clang by default.
[10:08:49 CEST] <rcombs> it's also apparently got better diagnostics
[10:08:58 CEST] <rcombs> judging by that thumbnail bug it spotted
[10:09:06 CEST] <nevcairiel> do you need like a special version of clang that supports cuda?
[10:09:22 CEST] <philipl> Separately, the issue with nvcc requiring libs even when producing a ptx is that, as far as I can tell, they do a host-side compilation during the flow even if you don't want it and then discard it.
[10:09:32 CEST] <rcombs> ubuntu's clang has it by default
[10:09:35 CEST] <philipl> nevcairiel: no. It's all upstream now
[10:09:41 CEST] <rcombs> the one that ships with Xcode doesn't, but the one in brew does
[10:09:41 CEST] <nevcairiel> does the msvc clang have it?
[10:09:42 CEST] <nevcairiel> :D
[10:09:55 CEST] <rcombs> dunno about msvc, didn't test that
[10:10:04 CEST] <nevcairiel> incidentally, they also gave msvc-clang full IDE support recently
[10:10:22 CEST] <philipl> Also, who cares. If you have neither a working clang nor a working nvcc, you go and get one. That's the same as what the story is today but with less licencing hassl.e
[10:10:30 CEST] <rcombs> yup
[10:10:46 CEST] <rcombs> I was also a little worried about abs(), since the SDK calls some extern function for that
[10:11:04 CEST] <philipl> rcombs: I really appreciate this. I got as far as seeing the need to write a header but never tried to work out what it needed to contain.
[10:11:13 CEST] <rcombs> but nvcc magically turns that extern call into an abs instruction, and clang does the same thing with my code
[10:11:35 CEST] <rcombs> philipl: my procedure was just "build, look at the errors, and add the stuff it's complaining about"
[10:12:17 CEST] <rcombs> hmmmmmmmm
[10:12:22 CEST] <philipl> a problem?
[10:12:29 CEST] <rcombs> I think I could implement color format conversion pretty easily
[10:12:45 CEST] <philipl> As a cuda filter?
[10:12:50 CEST] <rcombs> as part of the scale filter
[10:13:15 CEST] <philipl> probably. It's not the smartest filter right now.
[10:13:35 CEST] <rcombs> hell, I can probably macro-ize a lot of this
[10:13:49 CEST] <rcombs> or template-ize
[10:15:07 CEST] <rcombs> why do cu_func_ushort4 and cu_func_uchar4 even exist, they're not used
[10:15:29 CEST] <philipl> Need to ask the nvidia engineers who wrote the original filter
[10:15:55 CEST] <rcombs> maybe for RGBA support, I guess
[10:16:41 CEST] <philipl> Yeah, but they never hooked it up
[10:17:29 CEST] <rcombs> I don't especially care to either tbh
[10:17:41 CEST] <rcombs> but I guess if I template-ize I'll keep it generic
[10:18:38 CEST] <rcombs> y'know what this would actually be easier to do if I didn't have to support the real SDK
[10:18:47 CEST] <rcombs> though I can do it anyway
[10:18:51 CEST] <philipl> I'm fine with dropping support for it.
[10:19:11 CEST] <philipl> At least once we're comfortable with the functionality and know what our macos and windows stories are.
[10:19:21 CEST] <philipl> s/macos//
[10:22:11 CEST] <philipl> rcombs: Someone will probably argue that the compat header should go in ffnvcodec too. Do you mind?
[10:22:25 CEST] <rcombs> fine by me
[10:22:25 CEST] <rcombs> and I don't particularly care what it's licensed under
[10:22:30 CEST] <rcombs> though, it might be preferable to keep it in ffmpeg
[10:22:38 CEST] <rcombs> because it might need additions as new stuff's added
[10:22:50 CEST] <durandal_1707> cehoyos: do you have fast multicore CPU near you?
[10:22:58 CEST] <rcombs> it's really less API declaration and more utility library tbh
[10:23:17 CEST] <cehoyos> I have access to a multicore hardware, how can I help you?
[10:23:36 CEST] <philipl> rcombs: we've had to add a bunch of stuff to the other headers over time. It's a tax on doing business, to be sure, but it's managable.
[10:23:53 CEST] <philipl> And it allows other projects to benefit. I use the headers in mpv, although I seriously doubt we'd ever have a kernel in there.
[10:24:19 CEST] <durandal_1707> cehoyos: just test dsddec patch speed improvement if you want
[10:24:19 CEST] <philipl> Anyway, I'm personally fine with it in compat but the same arguments that lead to the original headers getting moved out apply here too.
[10:24:19 CEST] <cehoyos> Could you point me to an input file?
[10:25:34 CEST] <rcombs> philipl: also, we -include it, as you can see in configure
[10:25:43 CEST] <philipl> yeah.
[10:25:50 CEST] <rcombs> which could be done with a file in ffnvcodec but it's a bit more awkward
[10:26:05 CEST] <philipl> As I said, I'm fine pushing as-is, but I'd bet on grumbling in due course.
[10:26:07 CEST] <rcombs> I see this as more of an x86inc-style thing really
[10:26:16 CEST] <durandal_1707> cehoyos: http://www.lindberg.no/hires/test/2L-145/2L-145_mch_DSF_2822k_1b_01.dsf
[10:26:29 CEST] <rcombs> I'm hoping to get a bit more testing from people who know this code better than I do before pushing
[10:26:42 CEST] <durandal_1707> in up directory you could find smaller mch dsf files i think
[10:26:55 CEST] <philipl> OK. As I said in the email, I'm still two weeks out from being able to test.
[10:27:51 CEST] <philipl> Hopefully BtbN can take a look sooner.
[10:31:37 CEST] <cehoyos> durandal_1707: Is this the patch? avcodec/dstdec: add slice threading support
[10:31:50 CEST] <durandal_1707> cehoyos: not that one, dsddec one
[10:32:31 CEST] <cehoyos> avcodec/dsddec: add slice threading support ?
[10:32:40 CEST] <durandal_1707> yes, that one
[10:36:09 CEST] <cehoyos> It gets slower and it heats like crazy...
[10:36:28 CEST] <durandal_1707> lol, how much slower?
[10:37:45 CEST] <cehoyos> It only gets two times slower, but it heats up to ten times
[10:38:11 CEST] <durandal_1707> what is your CPU?
[10:38:20 CEST] <cehoyos> With -threads 1: real    0m6.090s, user    0m6.005s
[10:39:06 CEST] <cehoyos> With automatic threads: real    0m13.079s, user    0m59.798s
[10:39:15 CEST] <durandal_1707> omg
[10:39:36 CEST] <durandal_1707> this is with 8 cores or?
[10:40:18 CEST] <cehoyos> This is on an old 4-core, hyperthreading Intel cpu, on Power (with many cores), its: user    0m17.020s vs real    0m31.435s, user    2m2.515s
[10:40:50 CEST] <cehoyos> Please ask Stefano for new hardware...
[10:40:52 CEST] <cehoyos> (Or Intel)
[10:41:17 CEST] <cehoyos> Yes, 8 hyperthreading cores
[10:41:38 CEST] <durandal_1707> i have new hardware, but not here, and another one here too but that computer is dead....
[10:42:15 CEST] <durandal_1707> here i have 2 core Celeron CPU, and it is also 2 times slower with multiple threads
[10:42:40 CEST] <durandal_1707> this is really strange
[10:54:02 CEST] <cehoyos> Does anybody know why the irc logging does not work anymore?
[11:52:29 CEST] <durandal_1707> michaelni: do you know why execute2 would make decoding slower for audio with >1 threads?
[12:07:14 CEST] <michaelni> know? no, but maybe the overhead for thread activation and sync is too expensive in relation to the smaller computational units in audio
[12:07:47 CEST] <durandal_1707> michaelni: i doubt so, the decoder is pretty slow even with 1 threads
[12:08:24 CEST] <durandal_1707> michaelni: or you mean because packet sizes feed to decoder are too small?
[12:09:29 CEST] <michaelni> more guessing than , meaning, i dont know what the packet sizes are or what testcase you use ... but yes thats what i was guessing
[12:34:59 CEST] <nevcairiel> how big are dsd packets? if they are too small, the overhead would indeed kill it
[12:36:39 CEST] <durandal_1707> 8192 bytes size with 1044 duration, for 352800 sample rate
[12:37:36 CEST] <durandal_1707> around 87 packets per second
[12:38:41 CEST] <durandal_1707> for flac ratio is around 2x times lower
[12:39:25 CEST] <nevcairiel> thats not terribly small, but since slice threading was made for video, perhaps still too much overhead
[13:28:45 CEST] <durandal_1707> i doupled packet sizes and slowdown is still 2x
[13:33:39 CEST] <cone-673> ffmpeg 03Stephan Hilb 07master:b761ae072a16: lavd/v4l2: produce a 0 byte packet when a dequeued buffer's size is unexpected
[15:03:09 CEST] <BtbN> I have an AMD card in my system now, so I can test building it, but my Nvidia Linux box had its PSU fail or something, so I can't test it in action.
[15:07:52 CEST] <philipl> sad panda. You'll fix that eventually right? :-)
[15:10:19 CEST] <BtbN> It's a special form-factor, so I can't just buy a new one, and will have to buy a whole new PC. I planned that for a while now anyway, but it's not something I'll do within the next couple days.
[15:13:41 CEST] <BtbN> Does any regular clang build work for this? Or does it need special flags?
[15:14:09 CEST] <BtbN> I'd even argue that we can drop the non-free for the nvcc builds as well, cause there is a way to product a functionally equivalent build now.
[15:17:41 CEST] <cone-673> ffmpeg 03Rodger Combs 07master:a0c19707811c: lavfi/vf_thumbnail_cuda: fix operator precedence bug
[15:18:48 CEST] <cone-673> ffmpeg 03Rodger Combs 07release/4.2:6a5ed71d36f7: lavfi/vf_thumbnail_cuda: fix operator precedence bug
[15:21:17 CEST] <JEEB> BtbN: I think a regular build with the target enabled
[15:22:11 CEST] <BtbN> Gentoo has a llvm_targets_NVPTX flag, I guess I need that
[15:33:17 CEST] <philipl> BtbN: it worked ootb for rcombs and in my incomplete clang experiments, it worked with my stock ubuntu clang too. I imagine Fedoras and SUSEs etc turn it on by default too.
[15:34:37 CEST] <philipl> I agree on removing nvcc from non-free, although I obviously supported that before too. :-)
[15:40:45 CEST] <j-b> Waw.
[15:40:56 CEST] <j-b> Someone deserves a bg hug
[15:40:59 CEST] <j-b> big
[15:51:01 CEST] <BtbN> j-b, it should be perfectly fine to drop the non-free on everything but libnpp now, should it? GPL is happy with clang providing functionally equivalent builds to nvcc?
[15:51:58 CEST] <BtbN> Specially as one can download nvcc without needing an account now. They gave up that part, and it's a straight forward download now. Still not under a free license though.
[17:36:07 CEST] <j-b> BtbN: I would remove the old one, or at least deprecate it
[17:36:43 CEST] <BtbN> Removing nvcc support will greatly increase the annoyance-level of making a Windows-Build
[17:37:12 CEST] <BtbN> Specially when building with msvc and the like
[17:40:25 CEST] <j-b> Why? MSVC supports clang-cl now
[17:40:36 CEST] <BtbN> Because you need to set it up first
[17:41:01 CEST] <BtbN> Download binaries from somewhere, put it in PATH, keep it up to date, ...
[17:41:24 CEST] <BtbN> They do not appear to provide official Windows binaries
[17:41:42 CEST] <j-b> Sorry, but licensing clarity > annoyance of building.
[17:42:04 CEST] <j-b> Especially, since now you have WSL and WSL2 on Windows.
[17:42:16 CEST] <BtbN> Those produce linux binaries and cannot access GPUs
[17:42:28 CEST] <j-b> Untrue.
[17:42:39 CEST] <j-b> You can cross-compile from WSL
[17:42:41 CEST] <BtbN> ...
[17:42:47 CEST] <BtbN> Very useful for MSVC builds
[17:43:15 CEST] <BtbN> I don't see any issue with the license anymore for nvcc, so no reason to drop it and break existing setups
[17:43:36 CEST] <j-b> I disagree.
[17:43:42 CEST] <j-b> but we will disagree here.
[17:44:28 CEST] <BtbN> Dropping that would be just another case of hurting users on the back of some weirdly perceived freedom from the evil commercial overlords.
[17:45:02 CEST] <j-b> commercial overlords that contribute a lot to this community...
[17:45:04 CEST] <j-b> ...not
[17:45:28 CEST] <j-b> This is just helping nVidia, and validating their strategy of doing nothing open source
[17:45:36 CEST] <j-b> and moving away from open source
[17:45:40 CEST] <BtbN> "Company XY doesn't seem very nice, let's do something that hurts thousands or more users, but doesn't fuzz the company one bit."
[17:45:46 CEST] <j-b> That's not true.
[17:46:04 CEST] <ubitux> -/g 14
[17:46:05 CEST] <BtbN> Nvidia clearly does not care if the CUDA/nvenc stuff is non-free or not
[17:46:18 CEST] <j-b> "Company XY does something fishy wrt to open source. We have an alternative, that works. Let's push that one"
[17:46:33 CEST] <j-b> This has nothing to do with being nice or not
[17:46:54 CEST] <BtbN> A functional 100% equivalent alternative, so by GPL, it's fine to drop the non-free flag on those filters, and offer both ways to build them
[17:48:32 CEST] <j-b> I disagree. But as I said, we will just disagree here.
[17:48:51 CEST] <BtbN> so, llvm does have Windows binaries. But only for older releases...?
[17:49:11 CEST] <BtbN> And for 9.0 snapshots. But not current 8.x. What?
[18:00:11 CEST] <nevcairiel> how do i find out what clang s upports?
[18:06:20 CEST] <nevcairiel> the help output includes cuda stuff, is that enough? :d
[18:08:33 CEST] <BtbN> it needs nvptx support
[19:58:49 CEST] <durandal11707> what i need to install to get cl.exe for VS 2019?
[20:02:21 CEST] <BtbN> Visual Studio 2019
[20:02:44 CEST] <BtbN> The Community Edition is enough. Just make sure to actually select C/C++ dev tools in the installer
[20:08:48 CEST] <durandal11707> how can i use VS profiler?
[20:27:58 CEST] <vel0city> Debug -> Profiler or Alt+F2 on vs2017
[20:34:25 CEST] <durandal_1707> but i need to profile ffmpeg, is that possible at all?
[20:36:51 CEST] <JEEB> I would guess this just lets you run an exe
[20:36:51 CEST] <JEEB> https://docs.microsoft.com/en-us/visualstudio/profiling/how-to-install-the-stand-alone-profiler?view=vs-2019
[20:37:35 CEST] <JEEB> https://docs.microsoft.com/en-us/visualstudio/profiling/command-line-profiling-of-stand-alone-applications?view=vs-2019
[20:43:06 CEST] <durandal_1707> hmm, i still can not get msvc to build, getting missing libCMT.lib
[20:51:04 CEST] <vel0city> @durandal_1707: either check errors if that didn't build successfully, or if it's not assigned as a dependency you will need to go to the Solution Explorer, find CMT or whatever it will be called, and right click -> Build to build it manually
[20:53:02 CEST] <vel0city> also afaik you don't need to build it yourself via VS if you have an .exe and it has debug symbols
[20:53:28 CEST] <vel0city> if that's an option for your case (ie. you haven't changed any code)
[21:05:33 CEST] <thardin> oh boy, dolby-e
[21:09:15 CEST] <cehoyos> We already auto-detect ac3 and dts, so auto-detection in wav should be possible.
[21:10:55 CEST] <JEEB> yea, there are some capture formats where I've also thought if there could be a flag to note that "btw try detecting if this is a coded bit stream within a PCM container"
[21:11:02 CEST] <JEEB> in my case mostly AAC
[21:18:21 CEST] <kierank> cehoyos: what about dolbye inside 302m inside ts
[21:18:28 CEST] <kierank> where the dolby e can span over 302m packets
[21:19:01 CEST] <kierank> JEEB: there is a flag in SDI but nobody uses it
[21:19:40 CEST] <JEEB> :)
[21:19:43 CEST] <JEEB> business as usual
[21:28:00 CEST] <durandal_1707> michaelni: if 2x slowdown slice thread decoding does not happen because of calling overhead what is next possible source of problem?
[21:31:28 CEST] <cehoyos> kierank: I always wanted to look into it, is it possible that only a few lines of code are needed?
[21:31:53 CEST] <kierank> cehoyos: for the simple case yes but the complex case is when the dolby e spans a packet
[21:31:58 CEST] <kierank> (i.e is misaligned to the video)
[21:32:12 CEST] <cehoyos> Needs a parser?
[21:32:34 CEST] <durandal_1707> parser does not modify data
[21:36:02 CEST] <durandal_1707> i gave up builing msvc toolchain it havent worked and i downloaded GBs of data
[21:49:51 CEST] <durandal_1707> i just added execute2 to flacdec for fun, and it causes also slowdown, but even with very small packet size, slowdown was much lower
[21:50:15 CEST] <durandal_1707> execute2 was doing nothing, returning immediately
[22:06:33 CEST] <durandal_1707> so it is not overhead, it must be something else
[22:47:42 CEST] <cehoyos> jamrial: 4/7 has a typo in the commit message
[22:48:19 CEST] <jamrial> cehoyos: ah, thanks for noticing. will fix it
[00:00:00 CEST] --- Wed Jul 31 2019