[Ffmpeg-devel-irc] ffmpeg-devel.log.20190903

burek burek at teamnet.rs
Wed Sep 4 03:05:08 EEST 2019


[00:08:46 CEST] <xmichael> https://developer.apple.com/videos/play/wwdc2019/502/
[00:26:57 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:3a5bcb1d1374: avcodec/v4l2_m2m: log requested pixel formats
[00:26:59 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:7b092a074be8: avcodec/v4l2_m2m: remove trailing whitespace in output identifier
[00:27:00 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:87daee944b11: avcodec/v4l2_context: log VIDIOC_REQBUFS failures
[00:27:00 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:5e2436c6def5: avcodec/v4l2_buffers: fix minor typos and whitespace
[00:27:01 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:dc180cd81577: avcodec/v4l2_m2m_enc: log errno on v4l2_set_ext_ctrl failures
[00:27:03 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:6852b85020cf: avcodec/v4l2_m2m_enc: fix typo in log message
[00:27:03 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:bad8365b2bc6: avcodec/v4l2_context: return {decoder,encoder}_cmd errors
[00:27:04 CEST] <cone-083> ffmpeg 03Lukas Rusak 07master:1d36b7b47ad4: avcodec/v4l2_buffers: return int64_t in v4l2_get_pts
[00:27:05 CEST] <cone-083> ffmpeg 03Jorge Ramirez-Ortiz 07master:da45ad48f993: avcodec/v4l2m2m: fix error handling during buffer init
[00:27:06 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:c95b1277332a: avcodec/v4l2_context: use EAGAIN to signal when input buffers are unavailable
[00:27:08 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:e8c5ce1acb8b: avcodec/v4l2_m2m: log planar mode used by driver
[00:27:08 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:b6c6f56e385c: avcodec/v4l2_m2m: use log_ctx variable consistently
[00:27:09 CEST] <cone-083> ffmpeg 03Aman Gupta 07master:e9cc873636ab: avcodec/v4l2_m2m: fix minor indentation issue
[00:40:43 CEST] <philipl> Lynne: Hmm. if you're using a VkImage on the vulkan side, why not use  cuExternalMemoryGetMappedMipmappedArray? That's what it's for.
[00:40:52 CEST] <philipl> It's the only way to interact with image memory.
[00:41:15 CEST] <philipl> I did do a buffer based prototype of that code path (as the initial driver release had buggy image interop) but I had to use an intermediate buffer for this reason.
[00:41:32 CEST] <philipl> (So cuda -> buffer -> image)
[00:42:56 CEST] <Lynne> I don't get what the code does with cuExternalMemoryGetMappedMipmappedArray
[00:43:57 CEST] <Lynne> it generates a 4 element array from a single memory?
[00:44:18 CEST] <philipl> So, you start by exporting the memory that's backing the VkImage on the vulkan side using the external_memory_fd mechanism.
[00:44:32 CEST] <philipl> Then you do cuImportExternalMemory specifying the same opaque_fd.
[00:44:47 CEST] <Lynne> I got that code done
[00:44:59 CEST] <philipl> Then you do cuExternalMemoryGetMappedMipmappedArray on the imported memory, giving you a multi-layered array
[00:45:04 CEST] <philipl> Then you extract layer 0 as a regular CUarray
[00:45:08 CEST] <philipl> Then you memcpy to that
[00:46:23 CEST] <philipl> (Get the regular array with cuMipmappedArrayGetLevel)
[00:47:27 CEST] <philipl> And if you actually imported a mip-mapped VkImage, the other levels would be meaningful.
[00:51:36 CEST] <Lynne> so I have a separate memory per plane, what is CUDA_EXTERNAL_MEMORY_MIPMAPPED_ARRAY_DESC.cufmt and .NumChannels meant to be
[00:51:43 CEST] <Lynne> .NumChannels = nv12 ? 2 : 1?
[00:51:56 CEST] <Lynne> cufmt = depth > 8 ? int8 : int16?
[00:58:12 CEST] <philipl> So, you're trying to map a multi-plane image? I have not tried that (mpv doesn't support it) but as I understand multi-plane images, each plane has a formal vulkan format which you need to reflect accurately
[00:58:34 CEST] <philipl> so the luma plane is an R8 equivalent and is 1 channel 8 bits, while the chroma plane is R8G8 and 2 channels of 8 bits
[00:58:53 CEST] <philipl> (for nv12)
[00:59:40 CEST] <Lynne> no, its a single plane
[00:59:40 CEST] <philipl> You'd hopefully want to write generic code based on the pix_fmt, in terms of how many channels and their widths.
[00:59:58 CEST] <philipl> You're doing one image per plane?
[01:00:05 CEST] <philipl> Then it's just whatever the image format is
[01:01:09 CEST] <Lynne> so .NumChannels = nv12 ? 2 : 1? and cufmt = r8g8 for nv12's second plane?
[01:01:36 CEST] <BtbN> So it's impossible to get a proper CUdevptr from a Vulkan image? Just an array?
[01:01:52 CEST] <philipl> Right. If you try and do getMappedBuffer, it will fail at runtime.
[01:02:13 CEST] <BtbN> Meh, so you are still forced to copy and can't map.
[01:02:13 CEST] <philipl> The image is treated as opaque and there's no way to get to the buffer.
[01:02:25 CEST] <philipl> You can map for output.
[01:02:29 CEST] <philipl> nvenc can take a CUarray now
[01:02:34 CEST] <BtbN> But ffmpeg can't.
[01:02:34 CEST] <philipl> and we could write filters to work with CUarrays.
[01:02:40 CEST] <philipl> You'd have to do that work.
[01:02:57 CEST] <BtbN> Getting in a second CUDA pix_fmt would cause quite an argument.
[01:02:58 CEST] <philipl> Remember how I occasionally point out we should have used CUarrays as our cuda format?
[01:03:17 CEST] <philipl> of course, nvdec not supporting it is garbage.
[01:03:17 CEST] <BtbN> Using CUdevptr has some significant advantages, which is the reason it was picked.
[01:03:46 CEST] <philipl> The only advantage I'm aware of is it avoids issues with nvdec. Everywhere else it would be a net benefit.
[01:04:19 CEST] <BtbN> You can do pointer arith on CUdevptr, you can just change the pix_fmt to a software pix_fmt and due to how the CUDA format works, it will just magically work.
[01:04:22 CEST] <philipl> Lynne: cufmt is CU_AD_FORMAT_UNSIGNED_INT8 with2 channels for the chroma plane. The format value is for each channel
[01:04:23 CEST] <BtbN> Making hwdownload trivial.
[01:04:43 CEST] <philipl> BtbN: hwdownload is a memcpy2D. That's equally easy on an array
[01:05:02 CEST] <BtbN> No, a CUdevptr can ba cast to a void*/char* and it will just work. It's unified memory.
[01:05:17 CEST] <philipl> When have we ever benfited from that?
[01:05:23 CEST] <BtbN> So with the current CUDA format, just set pix_fmt = sw_pix_fmt, and you get a valid frame.
[01:06:01 CEST] <philipl> That's not how we do hwdownload. If there's any code in ffmpeg that takes advantage of that, I've missed it.
[01:06:32 CEST] <BtbN> It's a very useful thing that was never put to use. But the original intent was to basically "map" CUDA frames to software frames like that, to apply non-cuda-filters without down and reupload.
[01:08:02 CEST] <philipl> I'm not terribly convinced it would work as transparently as that. You've got to allocate the original memory with the right flags or I think it still fails.
[01:08:02 CEST] <BtbN> Also, it's CUdevptr now, and changing it is impossible. Adding another CUDA pix_fmt is most likely gonna get blocked, and I'm also not a fan of it due to the confusion it would cause when building filter chains.
[01:08:29 CEST] <BtbN> Nah, on modern drivers all memory is unified.
[01:08:40 CEST] <philipl> BtbN: so, without an array format on our sides, it will force a gpu memcpy on 'hwdownload' as well to get a 'compatible' format.
[01:09:48 CEST] <philipl> but honestly, that's a good problem to have - it would mean we'd be making progress :-)
[01:11:37 CEST] <BtbN> It's an annoying inconsistency in CUDAs API
[01:11:55 CEST] <BtbN> some stuff forcing Arrays, some stuff raw pointers, and no proper way to treat one as the other
[01:13:25 CEST] <philipl> You can create an array over linear memory but no way to go back.
[01:17:35 CEST] <xmichael> Apple Low Latency HLS https://www.youtube.com/watch?v=3yrA2IOdqvw
[01:18:55 CEST] <Lynne> do I need to enable nvcc to compile the hwcontext?
[01:19:21 CEST] <Lynne> I'm getting fatal error: cuda.h: No such file or directory in hwcontext_vulkan.h if I include hwcontext_cuda.h
[01:22:43 CEST] <xmichael> Good Evening Nicolas17, from my million questions last night. It seems apple announced a low latency hls solution https://www.youtube.com/watch?v=3yrA2IOdqvw
[01:28:00 CEST] <philipl> Lynne: you don't include cuda.h. We use https://git.videolan.org/?p=ffmpeg/nv-codec-headers.git (BtbN's hard work) which is clean-room headers and a dynamic loader to avoid the cuda sdk
[01:28:41 CEST] <philipl> You need to include the loader header, although probably hwcontext_cuda_internal.h is what you want so you can get the function struct from the cuda context
[01:29:04 CEST] <philipl> And if you need cuda functions we haven't used before they have to be added to the loader.
[01:49:47 CEST] <Lynne> yeah, I figured as much
[01:50:03 CEST] <Lynne> if I include cuda_check.h I get many redefinitions
[01:50:08 CEST] <Lynne> if I don't, I get error: implicit declaration of function FF_CUDA_CHECK_DL
[01:50:30 CEST] <Lynne> if I do, it compiles and does 1 frame without errors and then stops, without any errors for some reason
[01:52:52 CEST] <philipl> is it blocking on the semaphore?
[01:55:24 CEST] <Lynne> not sure, it exits in CHECK_CU(cu->cuCtxPopCurrent(&dummy));
[01:57:57 CEST] <philipl> so yoi have some pipleine starting from an nvdec decode? Is it not decoding the nezt frame? when you say stopped, is it blocking somewhere? exiting?
[02:04:13 CEST] <Lynne> exiting
[02:04:27 CEST] <Lynne> just decoding from -hwaccel nvdec -hwaccel_output_format cuda
[02:06:15 CEST] <Lynne> wait, my fail, ff_hwframe_map_create is hard
[02:06:27 CEST] <philipl> heh.
[02:10:22 CEST] <Lynne> seems to be doing something, I can even download the vulkan frames, but if I dump to a file I get instaquit again
[02:12:07 CEST] <philipl> odd.
[02:12:28 CEST] <philipl> in terms of headers, look at nvdec.c for the sequence.
[02:12:43 CEST] <Lynne> it crashes on gpu listing for vulkan if I specify an encoder
[02:12:51 CEST] <Lynne> really odd
[02:17:03 CEST] <Lynne> its getting late, https://0x0.st/z4RY.patch https://0x0.st/z4Rg.patch are the new patches if anyone wants to test
[02:17:56 CEST] <Lynne> derive_device is broken, so "./ffmpeg_g -init_hw_device "vulkan=vk:1,debug=0" -hwaccel nvdec -hwaccel_output_format cuda -i sample.mkv -filter_hw_device vk -vf hwmap,format=vulkan -f null -" or similar would work
[02:20:31 CEST] <philipl> neat. Will try and look. Hppefully BtbN too
[03:28:47 CEST] <Lynne> new versions: https://0x0.st/z47b.patch https://0x0.st/z47c.patch
[03:29:08 CEST] <Lynne> fixed derive device, and some filter optimizations
[03:30:14 CEST] <Lynne> mapped cuda image looks wrong though, no luma, chroma stride looks correct but the entire image is offset
[03:30:38 CEST] <Lynne> er, no chroma, luma stride etc etc.
[03:49:30 CEST] <philipl> Lynne: different problem. I see luma looks correct with zero (green) chroma
[03:55:00 CEST] <Lynne> weird, I see https://0x0.st/z47j.jpg
[03:56:12 CEST] <philipl> I'm doing hwmap,format=vulkan,hwdownload,format=nv12 -c:v libx264
[04:01:56 CEST] <Lynne> same
[04:03:59 CEST] <taliho> does nicolas george use irc? 
[04:07:47 CEST] <taliho> i wanted to ask him a question about the blocking vs non-blocking email
[04:09:08 CEST] <BradleyS> if memory serves correctly, he does not
[04:10:27 CEST] <taliho> BradleyS: thanks
[04:13:41 CEST] <philipl> Lynne: So, I see you are getting the same fd back every time you export memory, and as you are never closing the fds (obviously have to handle cleanup in due course), that means the memory you are exporting is actually the same memory each time.
[04:14:15 CEST] <philipl> That also means the allocations must be offset within the memory, and you are always setting your offset to 0 when importing.
[04:14:26 CEST] <philipl> So I think each plane is being read from the same memory.
[04:34:06 CEST] <philipl> Lynne: OK, so if I'm reading your code properly, you are doing a separate allocation for each plane, which seems like it should not lead to each export being of the same memory.
[04:34:18 CEST] <philipl> yet that is what we se.
[04:35:15 CEST] <philipl> You are not passing an VkExportMemoryAllocateInfo when allocating the memory, which is conceptually an error, although the last time I ran without it, it worked with the nvidia driver.
[04:35:21 CEST] <philipl> Still, it might be part of the issue.
[04:38:04 CEST] <philipl> VkImage is also being created without VkExternalMemoryImageCreateInfoKHR
[05:03:58 CEST] <philipl> Lynne: ok. I think the cuImportExternalMemory immediately frees the fd in practice so it is available for reuse. That explains that.
[05:04:47 CEST] <philipl> Your use of lseek isn't valid - you can't lseek a non-dma-buf fd for size. You must keep track of the size in the AVVkFrame when allocating the memory. I added this logic but it doesn't change the visual results.
[12:53:15 CEST] <JEEB> y/33
[14:34:08 CEST] <Lynne> philipl, BtbN: fixed that, now works fine except its leaking device memory
[14:44:51 CEST] <Lynne> seems like I just needed to call cuMipmappedArrayDestroy and cuDestroyExternalMemory
[14:47:03 CEST] <Lynne> unfortunately its slower than hwupload for now
[14:48:50 CEST] <BtbN> Nvidias software path is pretty damn efficient, and there is hardly ever much of a performance diff between zero-copy and round-trip
[15:11:19 CEST] <Lynne> yeah, I'm still allocating a new non-pool image on every import
[15:11:42 CEST] <Lynne> this doesn't sound efficient at all, so I'm trying to use av_hwframe_get_buffer to get images from the pool
[15:26:41 CEST] <Lynne> why is dst_hwfc->pool NULL when the mapping function gets called?
[16:10:39 CEST] <philipl> Lynne: you definitely need to use pool images and you need to keep them imported for re-use, and only clean up when the image is freed.
[16:11:28 CEST] <philipl> not sure why pool is empty. should be allocated when context is created right?
[20:00:47 CEST] <cone-523> ffmpeg 03Aman Gupta 07master:7eb465e185a7: configure: ensure --enable-omx-rpi uses rpi-specific IL headers
[20:48:18 CEST] <Lynne> philipl: it seems derived contexts aren't really meant to have pools as such
[20:48:36 CEST] <Lynne> /* A derived frame context is already initialised. */ in av_hwframe_ctx_init()
[20:49:16 CEST] <Lynne> not sure how to solve it, apart from adding a secondary internal pool to the hwcontext
[20:54:40 CEST] <jkqxz> Allocating in a derived frame context allocates in the source and then maps immediately.  See av_hwframe_get_buffer().
[20:58:05 CEST] <Lynne> so I'd need a cuda->vulkan mapping in hwcontext_cuda to implement mapping from cuda->vulkan in hwcontext_vulkan?
[21:00:03 CEST] <jkqxz> You can implement it in either end.  There is both map_to and map_from.
[21:01:38 CEST] <Lynne> the issue is there's no way to map cuda->vulkan without copying, and we need a valid vulkan frame to copy to, and doing it without pool is slow
[21:02:08 CEST] <jkqxz> Don't you want to use transfer rather than mapping, then?
[21:03:25 CEST] <jkqxz> Or maybe you could store a cache somehow in the private context information (AVHWFramesInternal.priv).
[21:10:18 CEST] <Lynne> can transfer do hw->hw?
[21:18:40 CEST] <jkqxz> Sure.  It was always vaguely intended, but I don't think it's used implemented anywhere at the moment.
[21:54:52 CEST] <cone-523> ffmpeg 03Anthony Delannoy 07master:39f129593756: avformat/mpegts: Check if ready on SCTE reception
[22:05:52 CEST] <cone-523> ffmpeg 03Anthony Delannoy 07release/4.2:611eb9594376: avformat/mpegts: Check if ready on SCTE reception
[22:47:53 CEST] <nevcairiel> maybe matroska ebml list parsing should use fast malloc, might use a tad bit more memory, but avoid a billion reallocs
[22:49:23 CEST] <nevcairiel> or some sort of custom growing so its not one by one
[23:13:18 CEST] <jamrial> nevcairiel: i think the issue is in av_add_index_entry(), which uses av_fast_realloc
[23:13:44 CEST] <jamrial> and since it's 90k seek points...
[23:15:44 CEST] <JEEB> not... a small amount
[23:16:50 CEST] <nevcairiel> On a quick read i thought it might be this one http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/matroskadec.c;h=1ea9b807e6c1c2b2b0bb0166984fcaca960235ba;hb=HEAD#l1239
[23:18:11 CEST] <Lynne> philipl: using a pool now, its still as slow, 200fps vs 500fps for hwupload+filtering
[23:18:28 CEST] <Lynne> seems like the driver does its own pooling as well
[23:18:55 CEST] <cehoyos> nevcairiel: maybe...
[23:19:43 CEST] <nevcairiel> 90k seek points isnt that special for  av_add_index  tbh
[23:19:55 CEST] <nevcairiel> mov uses the stream index structures for all its indexing
[23:20:00 CEST] <nevcairiel> it would easily reach such amounts
[23:20:07 CEST] <cehoyos> jamrial: Yes, it is "void *newelem = av_realloc_array" and has to be replaced with fast_realloc (if it is considered a bug)
[23:34:05 CEST] <philipl> Lynne: uh. Sad panda. What frame size?
[23:36:48 CEST] <jamrial> nevcairiel, cehoyos: you're right, changing that to av_fast_realloc solved it
[23:36:54 CEST] <jamrial> there's no delay at all now
[23:37:00 CEST] <jamrial> will send a patch in a moment
[23:37:18 CEST] <nevcairiel> need to add a new variable somewhere to track the allocated size, i thought that might be slightly messy, but i suppose its ok 
[23:37:32 CEST] <cehoyos> in the context, no?
[23:37:50 CEST] <nevcairiel> well in the list struct
[23:38:05 CEST] <nevcairiel> which lives somewhere in the context
[23:39:37 CEST] <philipl> Lynne: and I assume the pool is actually re-using mapped frames and not creating new ones all the time
[23:51:28 CEST] <Lynne> philipl: 720p
[23:53:00 CEST] <Lynne> yes, checked, only 4 frames in total get allocated in the pool
[00:00:00 CEST] --- Wed Sep  4 2019


More information about the Ffmpeg-devel-irc mailing list