[Ffmpeg-devel-irc] ffmpeg-devel.log.20190729

Sun Aug 25 02:59:35 EEST 2019

[00:23:29 CEST] <cone-233> ffmpeg 03Mark Thompson 07master:338714786058: vaapi_encode: Add ROI support
[00:23:29 CEST] <cone-233> ffmpeg 03Mark Thompson 07master:20fed2f0ab19: lavfi: addroi filter
[02:59:42 CEST] <rcombs> does anyone know why cuda_nvcc is marked as a nonfree library dependency?
[03:00:02 CEST] <rcombs> I'm not super familiar with the details but it seems like a build-time tool to me
[03:01:12 CEST] <rcombs> and it looks as if the output should be dynamically linked against ffnvcodec, just like other non-nonfree stuff
[03:06:42 CEST] <rcombs> (contrast with libnpp, which is a hard runtime dep)
[03:44:20 CEST] <rcombs> is it something about nonfree headers that NVCC implicitly includes?
[03:49:26 CEST] <rcombs> the only actual uses of those headers seem to be fairly trivial
[03:59:05 CEST] <rcombs> also, the output of NVCC is a text shader file to be run on the GPU, not CPU-executable code
[04:04:48 CEST] <rcombs> one could also argue that the graphics API is a "Major Component"
[05:12:42 CEST] <rcombs> https://twitter.com/yukihohagiwara/status/1155632012692271105
[05:29:54 CEST] <rcombs> erm, that is the wrong channel
[08:51:10 CEST] <rcombs> well, I'm gonna reach out at Nvidia and see if they can clarify the license situation
[09:18:10 CEST] <cone-420> ffmpeg 03Gyan Doshi 07master:43891ea8ab28: avfilter/fade: don't allow nb_frames == 0
[09:58:38 CEST] <nevcairiel> rcombs: i believe the reason was that nvcc itself is hidden behind a signup wall and EULA, as well as it including some nvidia cuda library functions in the output which may be licensed awkwardly
[09:59:15 CEST] <rcombs> you don't actually need to signup to download it, there's a public downloads page with regular links
[09:59:58 CEST] <rcombs> and it doesn't look like the PTX output contains any actual library code, but it _is_ derived from the headers (also publicly downloadable)
[10:00:18 CEST] <rcombs> I'm asking for an explicit EULA exception on compiling those headers
[10:02:13 CEST] <rcombs> (something like the linux or GCC GPL exceptions)
[10:04:37 CEST] <rcombs> also you can compile .cu files with clang, so long as you have the headers
[10:05:24 CEST] <rcombs> none of ffmpeg's .cu files use much, one could probably make tweaked versions that work in clang without the headers
[10:12:16 CEST] <nevcairiel> you should talk to BtbN and philipl, they considered all sorts of  methods to get portable builds with cuda stuff included, with various downsides
[10:22:47 CEST] <durandal_1707> why i get slower decoding with slice threading and DSD/DST?
[10:25:40 CEST] <rcombs> it's basically a couple typedefs and the Tex2D function (a 1-liner)
[10:30:00 CEST] <rcombs> and tex2D is just a thunk to asm("__itex2D_uchar"); or the like
[11:08:42 CEST] <rcombs> nevcairiel: this is all the header required to build vf_scale_cuda.cu using clang: https://gist.github.com/12215f97e7de29033db6dad848141a3e
[11:08:59 CEST] <rcombs> clang -c -v -S -o out.ptx libavfilter/vf_scale_cuda.cu -nocudalib -nocudainc --cuda-device-only -O3
[11:11:34 CEST] <rcombs> I think there's a pretty solid argument to be made that struct declarations and a single template that simple are trivial enough to be fair use regardless
[11:18:30 CEST] <BtbN> rcombs, there was no public download site last time I checked. You need an account and accept their EULA to get NVCC and the required libs.
[11:18:46 CEST] <rcombs> BtbN: then you haven't checked sufficiently recently
[11:18:51 CEST] <BtbN> Unless that recently changed? Then we might be able to reconsidder the nvcc license situation.
[11:18:59 CEST] <rcombs> https://developer.nvidia.com/cuda-downloads
[11:19:43 CEST] <rcombs> I think it gives a EULA prompt during installation? I've been manually unpacking the .debs mostly
[11:19:43 CEST] <BtbN> Yeah, that's new
[11:20:47 CEST] <BtbN> I like how https://developer.nvidia.com/FFmpeg has a GitHub link for ffmpeg that links to git.ffmpeg.org
[11:24:01 CEST] <durandal_1707> nobody knows answer to my question?
[11:24:26 CEST] <rcombs> the EULA lists a bunch of files that are allowed to be redistributed, and most of the headers aren't on it, but there's a decent chance that a derivative work of simple header files like these aren't subject to the EULA anyway
[11:24:34 CEST] <rcombs> durandal_1707: slice threading, as opposed to frame threading?
[11:24:46 CEST] <rcombs> or as opposed to single-threaded
[11:29:42 CEST] <durandal_1707> rcombs: i added slice-threading to process several channels at once on dsd/dst audio decoders, and when its enabled its approx 2x times slower
[11:30:00 CEST] <durandal_1707> compared to single threads
[11:30:19 CEST] <rcombs> oh, DSD as in the audio codec
[11:31:11 CEST] <rcombs> isn't DSD decoding basically trivial? shouldn't that be memory-bound?
[11:31:14 CEST] <durandal_1707> why its becomes 2 times slower than 1 threads
[11:32:07 CEST] <nevcairiel> DST at least should benefit from threading
[11:32:15 CEST] <durandal_1707> rcombs: its pretty slow here on intel celeron N3050 crappy cpu
[11:32:45 CEST] <durandal_1707> nevcairiel: it should not be 2x slower, at leaast it should be little slower than single thread
[11:33:30 CEST] <durandal_1707> both should benefit, because its doing conversion from DSD to PCM
[11:33:42 CEST] <durandal_1707> and this is quite slow
[11:34:49 CEST] <rcombs> durandal_1707: well, post your code?
[11:34:55 CEST] <rcombs> and profile it?
[11:35:25 CEST] <durandal_1707> rcombs: see ML patches
[11:35:53 CEST] <BtbN> rcombs, it's also a bit of the issue that you cannot reproduce the shader blobs without getting a compiler that's behind an EULA
[11:35:55 CEST] <durandal_1707> i'm now on crappy windows 7 so dunno how to profile it
[11:36:08 CEST] <rcombs> BtbN: you can build with clang, as demonstrated above
[11:36:18 CEST] <rcombs> (but then the headers are behind EULA, yes)
[11:36:25 CEST] <BtbN> Not without still getting the same SDK, for the libs.
[11:36:32 CEST] <rcombs> you don't need the libs
[11:36:40 CEST] <rcombs> just the headers
[11:36:50 CEST] <BtbN> You need the libs to link the kernel
[11:36:51 CEST] <rcombs> (but yes they're under the same EULA)
[11:36:59 CEST] <rcombs> only at runtime, via ffnvcodec
[11:37:49 CEST] <BtbN> pretty sure nvcc insists on having the libs for the linker step
[11:39:36 CEST] <BtbN> I won't oppose removing non-free from it, given that the CUDA SDK is now a simple download
[11:39:37 CEST] <rcombs> there's no linker step here
[11:39:55 CEST] <rcombs> NVCC is just generating a .ptx, which is a text ASM file
[11:40:14 CEST] <rcombs> the build scripts then wrap that in a string literal in a .c
[11:40:30 CEST] <rcombs> which is passed to the CUDA lib via ffnvcodec at runtime
[11:40:43 CEST] <BtbN> I know, I wrote the build scripts for that. But nvcc is weird.
[11:41:21 CEST] <rcombs> I guess NVCC might be doing something dumb, I haven't tested it with the scripts missing
[11:41:29 CEST] <rcombs> *libs
[11:41:32 CEST] <rcombs> but clang's fine without them, at least
[11:42:25 CEST] <nevcairiel> does clang produce identical output?
[11:51:44 CEST] <rcombs> tried a couple of optimization levels and nope
[11:52:09 CEST] <rcombs> (though GPL doesn't require that the user be able to produce a bit-identical replication build, just a functionally-equivalent one)
[11:53:29 CEST] <rcombs> honestly though I might prefer making a little header file that just declares the stuff ffmpeg uses and building these with clang by default
[11:54:28 CEST] <nevcairiel> is the image output identical, and the performance s imilar?
[11:54:58 CEST] <nevcairiel> beause that would classify as "functionally equivalent"
[11:55:00 CEST] <rcombs> I'll need to have someone with an actual nvidia card test that
[11:55:12 CEST] <rcombs> I'd be pretty surprised if it wasn't, though
[11:55:34 CEST] <nevcairiel> i could easily see performance going wrong, tbh
[12:11:42 CEST] <durandal_1707> Lynne: what delays SIMD for tx? i need it for showspectrum, to remove power of 2 size limitation
[13:39:35 CEST] <durandal_1707> looks like slowdown is because of crappy CPU
[13:40:48 CEST] <durandal_1707> can anybody confirm? just apply dsd patch and test with http://www.2l.no/hires/ dsd files (take one of 5.1)
[13:41:15 CEST] <JEEB> I can test when I get home
[13:43:07 CEST] <durandal_1707> JEEB: concentrate on chair and your job :)
[13:51:25 CEST] <cone-420> ffmpeg 03Steven Liu 07master:23678462c0a3: avformat/hlsenc: Fix overflow of int for durations compute
[13:54:28 CEST] <rcombs> nevcairiel: alright when building with the actual headers, clang emits some questionable code involving function calls for 1-instruction externs, and extern data symbol accesses for stuff you can get from a special register
[13:54:47 CEST] <rcombs> https://gist.github.com/f714d2531e00275b4c9aaf05801f9bdf <-- this variant fixes all that, it now generates code that looks pretty similar to NVCC's
[14:01:43 CEST] <nevcairiel> using assembly to make a compiler behave doesnt inspire confidence in said compiler
[14:19:39 CEST] <Lynne> durandal_1707: dunno, worked on it last night and got pretty far with the 2x4 point fft
[14:20:29 CEST] <Lynne> still need to do the pass macro to combine them and then rewrite the 15-point fft, but not feeling good right about now
[14:33:36 CEST] <black> Hello ! How can I save AVPacket side_data in a persistent way (using h.264 in .mp4) ?
[16:35:25 CEST] <cone-420> ffmpeg 03Paul B Mahol 07master:630ea6b07f88: avcodec/cfhd: add bayer support
[16:58:41 CEST] <kierank> durandal_1707: should I open ticket for transform-type=2
[17:07:05 CEST] <durandal_1707> kierank: feel free to do it, i have working patch at home, but still do not understand why every 2nd frame is only 24bytes and actually null frame
[17:07:17 CEST] <kierank> durandal_1707: skip frame I guess?
[17:07:36 CEST] <durandal_1707> pointless waste of resources
[17:07:45 CEST] <kierank> needed maybe for cfr avi
[17:08:27 CEST] <durandal_1707> dunno, i havent encountered real P frame, all frames are I but different transform
[17:35:16 CEST] <cone-420> ffmpeg 03Guo, Yejun 07master:df8db345523f: dnn: add layer pad which is equivalent to tf.pad
[17:35:16 CEST] <cone-420> ffmpeg 03Guo, Yejun 07master:3805aae47966: fate: add unit test for dnn-layer-pad
[17:35:18 CEST] <cone-420> ffmpeg 03Guo, Yejun 07master:ccbab41039af: dnn: convert tf.pad to native model in python script, and load/execute it in the c code.
[18:08:23 CEST] <durandal_1707> cehoyos: you should also list cfhd improvements into your speech, not just vc1
[18:08:40 CEST] <cehoyos> I will test them;-)
[18:08:50 CEST] <cehoyos> Do I misremember the limitations of direct rendering?
[18:21:27 CEST] <durandal_1707> cehoyos: what are limitations of DR?
[18:21:48 CEST] <cehoyos> I thought you could either not read or only write once or not read and write again.
[18:22:10 CEST] <cehoyos> Because that's how graphic card hardware surfaces worked once upon a time
[18:23:28 CEST] <cehoyos> gtg
[19:21:09 CEST] <kierank> durandal_1707: i have no idea what carl is talking about
[19:21:12 CEST] <kierank> write only memory
[19:21:37 CEST] <durandal_1707> kierank: old libavfilter had something like nonsense, write only once
[19:22:32 CEST] <j-b> wut?
[19:29:15 CEST] <durandal_1707> actually it have nothing to do with previous FFmpeg code, its just something taken out of wild
[21:03:30 CEST] <cone-420> ffmpeg 03Mark Thompson 07master:f9b8503639c0: cbs_h264: Fix missing inferred colour description fields
[21:03:31 CEST] <cone-420> ffmpeg 03Mark Thompson 07master:b123d0780ec2: h264_metadata: Support overscan_appropriate_flag
[21:03:45 CEST] <durandal_1707> anyone tried dsddec patch to make sure it is faster on modern CPU?
[21:30:52 CEST] <BtbN> nevcairiel, nvidia themselves use clang now iirc, so I doubt there is a problem
[21:33:59 CEST] <taliho> Hello, I'm working on ZMQ option for URLProtocol.
[21:34:26 CEST] <taliho> Am I right in understanding that it's not possible to control the minimum size of the buffer in the read operation: int (*url_read)( URLContext *h, unsigned char *buf, int size) ? 
[21:34:47 CEST] <taliho> I see that there is a min_packet_size in URLProtocol, but from what I can tell, it only affects the write operation. 
[21:39:18 CEST] <taliho> I plan to use AVFifoBuffer with a read operation running in a separate thread, but I wanted to do a simple example first that avoids mutexes 
[22:59:35 CEST] <cone-420> ffmpeg 03Michael Niedermayer 07master:009ec8dc3345: avcodec/eatgv: Check remaining size after the keyframe header
[22:59:37 CEST] <cone-420> ffmpeg 03Michael Niedermayer 07master:5ffb8e879389: avcodec/eatqi: Check for minimum frame size
[23:11:35 CEST] <cehoyos> So direct rendering does not imply restrictions on accessing the destination frame?
[23:11:37 CEST] <cehoyos> I find that surprising
[23:17:59 CEST] <BBB> cehoyos: I don't think it's defined as such
[23:18:31 CEST] <cehoyos> You mean there were never any restrictions on accessing destination frames when doing direct rendering?
[23:18:35 CEST] <BBB> yes
[23:18:46 CEST] <BBB> it may be intended as such by some people using it
[23:18:47 CEST] <BBB> but
[23:18:49 CEST] <cehoyos> Historically speaking, I am quite sure this is not correct.
[23:18:51 CEST] <BBB> codecs don't do it that way
[23:19:30 CEST] <BBB> I understand why you'd want to reduce read access if it were in distinct memory on GPU
[23:19:33 CEST] <cehoyos> I believe a bug was introduced (possibly a long time ago) and spread within FFmpeg...
[23:19:46 CEST] <BBB> but that's not how most decoders I've looked at and/or worked on use it
[23:19:57 CEST] <cehoyos> Iirc, you simply cannot read data from (some) hardware surfaces
[23:20:10 CEST] <BBB> I understand
[23:20:53 CEST] <BBB> my expectation is that if such memory locations were used with the user-supplied frame-buffer API (request-frame), then things would work pretty shittily
[23:21:06 CEST] <BBB> particularly with more modern codecs such as h264/vp8 etc.
[23:21:11 CEST] <BBB> ("""modern""")
[23:21:14 CEST] <cehoyos> So what does "direct rendering" mean if not that gpu hardware may be accessed?
[23:21:34 CEST] <BBB> it means that data becomes available on a line-by-line basis
[23:21:46 CEST] <cehoyos> If h264 supported direct rendering, I am sure it did work
[23:21:47 CEST] <BBB> "we are now done up until mby=16"
[23:22:15 CEST] <cehoyos> And I thought libavcodec only returns frames?
[23:22:30 CEST] <BBB> there's a callback to indicate lines being done
[23:23:06 CEST] <cehoyos> So basically an existing flag changed its meaning at some point?
[23:23:54 CEST] <Lynne> you're both wrong and confused
[23:24:40 CEST] <BBB> I don't know whether it changed meaning, I just now how I interpreted it and how (afaict) it was used in codecs I looked at
[23:24:45 CEST] <Lynne> direct rendering just lets the user give memory to decode into, there were nevery any special requirements but alignment
[23:25:04 CEST] <Lynne> it doesn't imply gpu, access order or anything like that
[23:25:04 CEST] <BBB> but if you say that's previously it was used in a different way (this may or may not be true, I don't know), then yes
[23:25:10 CEST] <Lynne> for that we have hardware frames
[23:25:49 CEST] <Lynne> it wasn't used in a different way, there wasn't a "way" specified in the first place
[23:26:07 CEST] <BBB> the callback is draw_horiz_band
[23:26:35 CEST] <BBB> vp3 uses it, for example
[23:26:52 CEST] <BBB> I was under the impression vp8 used it also, but that was probably removed when it started supporting slice threading
[23:26:59 CEST] <BBB> (or anyway, it currently does not use it)
[23:27:32 CEST] <Lynne> that callback also gives no guarantees about writes
[23:27:39 CEST] <cehoyos> Lynne: Could you stay out of this? This is about a flag introduced a long time before you became active here so it is unlikely that you know what this is about
[23:28:00 CEST] <Lynne> well, no, because I've used the API enough to know exactly all about it
[23:28:06 CEST] <cehoyos> That's unlikely
[23:28:52 CEST] <Lynne> you can choose to not believe me too, that works
[23:30:01 CEST] <BBB> Lynne: my udnerstanding is that the callback basically says "I'm done with data up until this vertical point now, this will not be modified further"
[23:30:19 CEST] <BBB> so the "guarantee" is that we will not write above some vertical location after that callback is called
[23:30:31 CEST] <jkqxz> BBB:  That is AV_CODEC_CAP_DRAW_HORIZ_BAND, which is orthogonal to AV_CODEC_CAP_DR1.
[23:30:38 CEST] <Lynne> yes, but we can write below
[23:30:43 CEST] <BBB> yes we can
[23:31:40 CEST] <Lynne> hence we can skip pixels, hence in general, we can write and seek to arbitrary locations
[23:32:28 CEST] <Lynne> only imposed order is when the callback is active and used
[23:32:56 CEST] <Lynne> its the same type of confusion someone else had when they didn't understand memory in packets
[23:33:24 CEST] <Lynne> until you submit a packet to output you can seek the bit writer to the start and write whatever you've missed
[23:33:34 CEST] <BBB> jkqxz: hm... maybe I mis-remember then
[23:33:46 CEST] <BBB> (trying to go through git history to see, but not having an easy time ...)
[23:37:23 CEST] <kierank> 22:30:33 <"jkqxz> BBB:  That is AV_CODEC_CAP_DRAW_HORIZ_BAND, which is orthogonal to AV_CODEC_CAP_DR1.
[23:37:23 CEST] <kierank> yes
[23:38:04 CEST] <kierank> DR1 is just, "here use your own memory". In the past you had to allocate edges as well but most codecs are EDGE_EMU now
[23:57:13 CEST] <rcombs> nevcairiel: it's more like, the official headers define some basic function calls and extern data accesses, and the NVCC compiler magically transforms those to specialized instructions, but clang doesn't have whatever hardcoded magic that is, so I just have to define the relevant intrinsics for it
[00:00:00 CEST] --- Tue Jul 30 2019