[Ffmpeg-devel-irc] ffmpeg-devel.log.20180507
burek
burek021 at gmail.com
Tue May 8 03:05:03 EEST 2018
[00:01:25 CEST] <rcombs> isn't the whole point of 708 being in the video bitstream compatibility with SDI and analog NTSC
[00:03:28 CEST] <kierank> 708 contains 608 bytes
[00:03:36 CEST] <kierank> which are backwards compatible with analogue
[00:03:48 CEST] <kierank> sdi can carry full 708 but it's rarely used
[00:04:18 CEST] <JEEB> yup
[00:04:45 CEST] <wm4> what are the 100 bytes used for
[00:33:39 CEST] <kierank> it's the name of the spec
[01:48:00 CEST] <cone-455> ffmpeg 03James Almer 07master:0736f32a4fac: configure: fix and simplify xlib check
[01:50:36 CEST] <jamrial> nevcairiel: ping
[09:13:21 CEST] <JEEB> and since I asked before about parser allocation failures, I've seen things like this "Failed to reallocate parser buffer to 2070044476", "Failed to reallocate parser buffer to 2134313519"
[11:01:52 CEST] <hanna> atomnuker: btw, do you have any thoughts about maybe using libplacebo for your vulkan filters? no need to reinvent most of the wheel here for the stuff that libplacebo already implements (e.g. the scaling code, and tone mapping)
[11:02:05 CEST] <hanna> you could be a good target to model the VkImage interop API around
[11:02:25 CEST] <hanna> especially making sure it would also work with mapped DRM surfaces and whatnot
[11:09:06 CEST] <durandal_1707> NIH
[11:11:18 CEST] <hanna> I'm not saying you have to rip out your own code, but a vf_placebo provided alongside whatever you have already would be a good addition I think
[11:11:32 CEST] <hanna> It's not like lavfi doesn't already have wrappers to third party libraries
[13:05:25 CEST] <pkv> @Btbn Hi, was wondering if the recent changes in nvenc might elicit a micro bump to avcodec ; what is the policy towards micro bumps ?
[13:05:45 CEST] <BtbN> There was already a micro bump pushed after it
[13:06:06 CEST] <BtbN> So no need to push another one. It was just forgotten.
[13:07:24 CEST] <pkv> oh ok great, we might need it for obs-studio
[13:29:33 CEST] <BtbN> philipl, I just realized, on 64bit (Which is basically the only supported CUDA arch ffmpeg has), the CUdevptrs are also host pointers.
[13:30:13 CEST] <BtbN> That means, for the extra copy to host memory we currently do with nvdec, it could just do a dummy wrapper instead, creating a new classic sw frame, with the CUDA pointers casted to normal pointers
[13:30:55 CEST] <BtbN> That wrapped frame would then need to hold a reference to the original frame in its buffer, so it doesn't get de-allocated too early, but that should be absolutely possible
[13:49:57 CEST] <BtbN> hm, no. That wouldn't work properly.
[14:35:38 CEST] <durandal_1707> Compnn: ffmpeg used for drones again!
[14:56:25 CEST] <BtbN> Can pkg-config interpret complex version ranges?
[14:57:20 CEST] <BtbN> like "ffnvcodec >= 8.1.24.3 || ffnvcodec < 8.1 && ffnvcodec >= 8.0.14.3"
[15:23:41 CEST] <atomnuker> also it would be kind of an obscure dependency, as if libshaderc isn't enough (still not packaged by debian after lunarg offered to maintain it there)
[15:25:46 CEST] <atomnuker> and really its not that much code to scale and convert
[15:29:18 CEST] <BtbN> hm, the more I tinker around with this, the more I feel like the CUDA pix fmt is not going to cut it for performant nvdec
[15:29:20 CEST] <BtbN> great
[15:33:58 CEST] <BtbN> although I wonder, couldn't it create a PIX_FMT_CUDA frame with a custom allocator, that just is the mapped CUVID frame? And the destructor just unmapps it
[15:49:59 CEST] <JEEB> ooh, was able to create a sample that makes libavformat do weird things to timestamps. exciting~
[16:16:45 CEST] <JEEB> ok, so it tries to fix the timestamp in ff_read_packet, but its initial guess goes way the wrong way
[16:32:55 CEST] <JEEB> ok, so it wrong-updates reference for wrap-around and then deros
[16:32:57 CEST] <JEEB> *derps
[16:39:13 CEST] <microchip_> DerpOS
[16:55:21 CEST] <philipl> BtbN: Did you try it? I know cuda supports dual-use pointers if you allocate them in the right way. But IIRC, the frame we're talking about here is allocated by cuvid and you can't control how it's done.
[16:56:20 CEST] <BtbN> It won't work, it would lock out pre-Pascal hardware.
[16:56:48 CEST] <BtbN> And the idea with streams to at least make the copying async also won't work, because it needs the frame mapped until the async operation completes
[16:58:38 CEST] <philipl> Are you trying to optimise playback or transcode?
[16:59:10 CEST] <BtbN> both
[16:59:18 CEST] <BtbN> I'm trying to get rid of that needless extra copy it does
[16:59:49 CEST] <philipl> There are other opportunities to save copies on playback if we revist the frame-is-cuda-array idea; that lets the player pass opengl surfaces all the way through so there's just one copy.
[17:00:21 CEST] <BtbN> It currently does one completely pointless extra copy when using nvdec instead of cuvid
[17:00:40 CEST] <BtbN> nvdec always copies to a normal device memory cuda frame
[17:00:49 CEST] <philipl> hmm.
[17:00:54 CEST] <BtbN> and that then gets copied into the actually used frame, or software frame
[17:01:10 CEST] <BtbN> So just turning the mapped cuvid frame into a real frame would solve that
[17:01:56 CEST] <philipl> I'd have to read through and remind myself about this part.
[17:01:59 CEST] <philipl> But have to dash now.
[17:02:55 CEST] <philipl> It always seemed weird to me that you couldn't get cuvid to write the frame into a buffer we control from the beginning.
[17:04:58 CEST] <BtbN> well, you can't
[17:05:11 CEST] <BtbN> But you can turn the buffer it gives you into a frame
[17:05:53 CEST] <JEEB> begesus
[17:05:54 CEST] <JEEB> update_wrap_reference
[17:06:03 CEST] <JEEB> even for the case where a program is found for the stream
[17:06:19 CEST] <JEEB> this is something that looks rather needlessly complex
[17:39:59 CEST] <wm4> <JEEB> this is something that looks rather needlessly complex <- and doesn't even work?
[17:40:30 CEST] <JEEB> ok, so it works when you only get subtitle packets until you get a PCR
[17:40:52 CEST] <JEEB> because by then the subtitle packets' timestamps are fixed
[17:41:22 CEST] <JEEB> but if you get a video packet and a subtitle packet before you get a PCR, welp
[17:46:52 CEST] <jdarnley> Fuck! Not more BBC code!? Plus it is gstreamer: garbage and garbage
[17:47:14 CEST] <JEEB> now that is random
[17:51:24 CEST] <atomnuker> jdarnley: what do you mean?
[17:52:09 CEST] <jdarnley> I went looking for the thread discussing the patch that added rtp-vc2 decoding to see what was said.
[17:52:20 CEST] <jdarnley> Especially for hot to test the thing.
[17:52:23 CEST] <jdarnley> *how
[17:52:42 CEST] <jdarnley> What I found was ...
[17:54:36 CEST] <jdarnley> https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2016-February/188903.html
[17:54:39 CEST] <JEEB> also can someone explain to me why everyone keeps dividing PCR by 300?
[17:54:52 CEST] <JEEB> int64_t pcr = f->last_pcr / 300; is something I see in multiple places
[17:55:03 CEST] <JEEB> oh wait
[17:55:12 CEST] <JEEB> PCR is in 27MHz, right?
[17:55:14 CEST] <jdarnley> 90kHz to 29MHz?
[17:55:28 CEST] <JEEB> yea, I am dumb
[17:55:59 CEST] <jdarnley> It is a "magic number"
[17:56:09 CEST] <JEEB> yea
[17:56:19 CEST] <JEEB> I've been staring at this code for way too long :D
[17:57:24 CEST] <jdarnley> I do mean 27MHz
[17:58:52 CEST] <jdarnley> I should submit a patch to change every 300 I find to TWENTY_SEVEN_MEGAHERTZ_OVER_NINETY_KILOHERTZ
[18:16:12 CEST] <philipl> BtbN: So if I read this right, you'd ideally want to replace the buffer pool with wrapped mapped frames?
[18:16:26 CEST] <BtbN> yes
[18:16:37 CEST] <BtbN> The only question is if nvdec will run out of frames then
[18:16:49 CEST] <philipl> Isn't there a way to control how many frames nvdec uses?
[18:17:01 CEST] <philipl> Can't you say to use the same number as the pool size?
[18:17:38 CEST] <philipl> NumOutputSurfaces? right?
[18:19:43 CEST] <JEEB> wonder if we need an option to either scan for a PCR first, or just drop everything until we get a PCR in MPEG-TS
[18:19:52 CEST] <JEEB> because this thing seems to start fixing itself as soon as PCR is received
[18:20:41 CEST] <BtbN> I used to stare at mpegts PCR logic a while ago
[18:20:43 CEST] <BtbN> and gave up
[18:21:18 CEST] <jdarnley> Okay, I got ffmpeg talking to ffplay but I get 1 frame displayed every ~200 frames
[18:22:13 CEST] <jdarnley> Won't display anything for my own stream
[18:22:31 CEST] <jdarnley> and uplay crashes with the stream from ffmpeg
[18:22:55 CEST] <JEEB> :)
[18:26:13 CEST] <philipl> BtbN: Yeah, seems like if numOutputSurfaces == numDecodeSurfaces == pool size, you would get sane semantics.
[18:26:19 CEST] <BtbN> philipl, https://github.com/BtbN/FFmpeg/commit/cbdd183b8c9aee8f7ca15867ad63e29e9a1c359e something like this
[18:26:23 CEST] <BtbN> untested yet
[18:26:48 CEST] <BtbN> "Picture size 0x0 is invalid" ok, it doesn't like that
[18:27:29 CEST] <philipl> Presumably you can calculate what that should be.
[18:27:37 CEST] <BtbN> it's 0
[18:27:47 CEST] <BtbN> I don't want the hw_frames_ctx to waste memory
[18:27:53 CEST] <philipl> Oh, this again
[18:27:58 CEST] <ubitux> http://ubitux.fr/pub/pics/2018-05-07-182729_753x626_scrot.png
[18:28:01 CEST] <ubitux> what can i do? :(
[18:28:06 CEST] <BtbN> But it needs to exist, because the hwaccel infra expects it
[18:28:16 CEST] <BtbN> I'm not 100% sure on this though
[18:28:20 CEST] <philipl> Can't you change the pool implementation to not allocate instead?
[18:28:25 CEST] <BtbN> It might be possible to go without a hw_frames_ctx
[18:28:37 CEST] <BtbN> The pool implementation is in avutil
[18:28:47 CEST] <philipl> Yes, but it can be overriden, right?
[18:28:57 CEST] <BtbN> I'd have to put a whole other implementation into cuvid for that
[18:29:13 CEST] <BtbN> A mostly empty one, sure, but still
[18:31:19 CEST] <philipl> Didn't you also want a no-op pool for something in nvenc? Or did you remove the frames_ctx usage completely?
[18:31:33 CEST] <BtbN> It's just optional entirely
[18:31:49 CEST] <BtbN> in theory at least
[18:32:13 CEST] <BtbN> Hm, it crashes in cuMemcpy2D if I do things this way
[18:33:33 CEST] <philipl> For the later copy?
[18:34:38 CEST] <BtbN> yes
[18:34:47 CEST] <philipl> also: what API reference is Oscar using? He's referring to methods I don't see in the headers
[18:35:01 CEST] <BtbN> hm?
[18:35:13 CEST] <BtbN> Every cuMemcpy* funtion has an cuMemcpy*Async version
[18:35:15 CEST] <philipl> "DecodeLockFrame"
[18:37:45 CEST] <philipl> So, the hwcontext_cuda transfer stuff is calculating the planes potentially differently from the original memcpy2d you are removing.
[18:37:46 CEST] <BtbN> Something no documentation google knows mentions
[18:38:05 CEST] <BtbN> no, plane calculation is pretty straight forward
[18:39:34 CEST] <philipl> Well, I guess I'd starting by logging the src offset and size values for the transfer and see if they match what the old copy code comes up with
[18:39:46 CEST] <philipl> If the frame isn't unmapped, the copy shouldn't care if it happens early or late.
[18:40:20 CEST] <philipl> and you're doing this all under a single context?
[18:40:43 CEST] <BtbN> I'd hope so, wouldn't know ffmpeg.c to create more than one
[18:40:59 CEST] <BtbN> otherwise copying data wouldn't work in the first place
[18:42:38 CEST] <philipl> Hmm
[18:45:19 CEST] <atomnuker> jdarnley: keep in mind if you're testing with kierank's old hacked up obs streamer to support vc2 you should explicitly add -an
[18:46:12 CEST] <kierank> jdarnley: don't use ffmpeg to test
[18:46:14 CEST] <kierank> it won't work
[18:46:15 CEST] <atomnuker> when I worked on it I got streams out of it which had invalid audio timestamps or missing audio so when ffmpeg tried to sync up audio and video it just got stuck
[18:46:30 CEST] <atomnuker> of course it won't work if the audio's like that
[18:46:43 CEST] <kierank> receiving 300mbit/s in the same thread as your decoder won't work ever
[18:46:57 CEST] <jdarnley> I'm not sending 300M!
[18:47:28 CEST] <kierank> even 20 will blow it
[18:47:42 CEST] <kierank> it's a fools errand testing with ffmpeg, seriously
[18:48:09 CEST] <BtbN> philipl, not making a 1x1 sized frame fixes the crash. So I think something is not properly replaced, or re-replaced. gdb will tell
[18:50:07 CEST] <philipl> BtbN: So the hwframes_ctx params are getting factored into the transfer
[18:50:31 CEST] <BtbN> if they were, it'd just copy a 1x1 frame, and not crash
[18:51:02 CEST] <BtbN> cuda_transfer_data_from doesn't even access the hw_frames_ctx
[18:51:30 CEST] <BtbN> except for getting the shift_height, which should be accurate still
[18:53:35 CEST] <atomnuker> kierank: it isn't, its how I rewrote the decoder
[18:57:37 CEST] <wm4> kierank: why will it blow it?
[18:58:06 CEST] <kierank> wm4: if the decoding thread uses too much cpu you'll drop udp packets
[18:58:16 CEST] <kierank> this isn't tcp, they come whether you like it or not
[18:59:05 CEST] <kierank> the rtp input doesn't have the udp thread hack
[18:59:20 CEST] <wm4> wouldn't you just give the udp thread a higher priority
[18:59:35 CEST] <kierank> it's all in the main thread for rtp
[19:00:32 CEST] <kierank> main thread does pthread_join or similar so has to wait for the others
[19:00:39 CEST] <kierank> if that's a 35ms wait you've lost a lot of packets at 300mbi
[19:01:32 CEST] <wm4> is this something you'd use an embedded rtos for
[19:01:43 CEST] <kierank> no
[19:01:57 CEST] <kierank> you'd put the udp source in the a new thread, tune the kernel input buffers
[19:02:04 CEST] <kierank> and make that thread high priority ideally
[19:03:03 CEST] <wm4> all that boring bloated soft realtime desktop shit
[19:10:19 CEST] <BtbN> nevcairiel, or whoever is responislbe for hwcontext_d3d11va.c: in line 166, call to av_buffer_create, shouldn't it be size(*desc) instead of just size(desc)?
[19:11:11 CEST] <wm4> probably me
[19:11:23 CEST] <wm4> I don't think the size matters anyway
[19:11:34 CEST] <wm4> the buffer ref is only used for refcounting
[19:11:52 CEST] <wm4> the size would matter if the buffer data were directly involved in COW
[19:11:57 CEST] <BtbN> When would the size passed to av_buffer_create even ever matter?
[19:12:07 CEST] <BtbN> When you use that, you usually do custom stuff anyway
[19:12:30 CEST] <wm4> essentially it never matters
[19:12:52 CEST] <wm4> it can be useful if the bufferref is a flat byte array, but even then you have separate size fields (e.g. look at AVPacket)
[19:13:29 CEST] <wm4> also some clever code in lavfi vf_pad uses the bufferref data/size fields to "expend" planes in place if there's enough memory
[19:13:34 CEST] <wm4> but that doesn't matter for hwaccel stuff
[19:14:18 CEST] <wm4> regarding the d3d11 code, feel free to change it to *desc if that's less confusing, but it doesn't really matter
[19:26:39 CEST] Action: kierank hopes someone paid michaelni to work on mxf
[19:26:46 CEST] Action: kierank wouldn't work on it even with money
[19:29:40 CEST] <BtbN> philipl, it basically works. Just need to do something about the memory waste
[19:32:22 CEST] <BtbN> I'm tempted to introduce a special case in the cuda hw_frames_ctx, to give it a noop mode for mapped frames
[19:33:48 CEST] <durandal_1707> fftdnoiz is already faster that dctdnoiz and nlmeans
[19:39:53 CEST] <wm4> BtbN: like that guy suggested?
[19:40:04 CEST] <BtbN> no
[19:40:24 CEST] <BtbN> I basically do not want the hw_frames_ctx to do any allocations. But I need it to exist, because otherwise stuff explodes.
[19:40:48 CEST] <wm4> I mean it's designed to allow user allocation
[19:40:52 CEST] <wm4> or preallocation
[19:40:55 CEST] <BtbN> not inside of a hwaccel.
[19:40:57 CEST] <wm4> (whatever you like)
[19:41:02 CEST] <wm4> huh?
[19:41:20 CEST] <BtbN> I'm changing cuvid to just put the mapped cuvid frame into the AVFrame, to avoid the pointless copy
[19:41:42 CEST] <BtbN> So the pre-allocation by the hw_frames_ctx is a waste of memory
[19:42:12 CEST] <BtbN> so, now I'm looking for a way to communicate to the cuda hw_frames_ctx that it's not needed
[19:42:37 CEST] <wm4> you mean cuda allocates frames by itself?
[19:42:42 CEST] <wm4> as in nvidia's code
[19:42:52 CEST] <BtbN> cuvid has a map/unmap way of accessing the frame
[19:43:02 CEST] <BtbN> right now, it maps, copies to hw_frames_ctx allocated frame, unmaps
[19:43:24 CEST] <BtbN> but I changed it to map, set frame->data/linesize to mapped pointer, and add a buffer_ref to unmap on frame unref
[19:47:47 CEST] <BtbN> every solution I can think of for this is massively ugly
[19:48:20 CEST] <wm4> I still don't really get the problem
[19:48:33 CEST] <BtbN> nvdec is massively slower than cuviddec
[19:48:37 CEST] <BtbN> because of that pointless copy
[19:51:28 CEST] <wm4> I mean with not doing the copy
[19:51:47 CEST] <BtbN> You still need the hw_frames_ctx, because the hwaccel infrastructure demands it to exist
[19:52:06 CEST] <BtbN> and it has a bunch of frames in it, which can waste several GB of video memory
[19:52:58 CEST] <wm4> don't you need to allocate frames for cuda anyway?
[19:53:10 CEST] <wm4> or does cuda (the nvidia API) allocate frames for you
[19:53:22 CEST] <BtbN> the hw_frames_ctx allocates them for me
[19:53:34 CEST] <BtbN> but I don't want that, as I'm writing the mapped frame from cuvid in there
[19:54:07 CEST] <wm4> ...
[19:54:08 CEST] <BtbN> it's just never truely allocated, but mapped from the cuvid frame
[19:54:22 CEST] <wm4> so cuda allocates some sort of frame
[19:54:26 CEST] <BtbN> cuvid
[19:54:28 CEST] <BtbN> not cuda
[19:54:37 CEST] <wm4> uh they're the same?
[19:54:44 CEST] <wm4> cuvid is a subset of cuda, sure
[19:54:56 CEST] <wm4> or are you talking about the ffmpeg things
[19:55:21 CEST] <BtbN> cuvid is in its own whole library
[19:55:22 CEST] <wm4> cuda = nvidia's API, cuvid = the part of cuda that deals with video
[19:55:22 CEST] <philipl> cuvid is functionally unrelated to cuda. it was branded part of cuda but that's now dropped.
[19:55:29 CEST] <JEEB> yra
[19:55:33 CEST] <philipl> Hence nvdec
[19:55:49 CEST] <wm4> fine
[19:56:05 CEST] <wm4> so cuvid (nvidia's API) allocates frames automatically?
[19:56:12 CEST] <philipl> Yes.
[19:56:16 CEST] <BtbN> it doesn't allocate them normaly
[19:56:21 CEST] <BtbN> you map them to a pointer
[19:56:23 CEST] <BtbN> and unmap after use
[19:59:43 CEST] <philipl> BtbN: Could you add a device_ctx flag?
[19:59:55 CEST] <BtbN> no
[19:59:59 CEST] <BtbN> it would poison the device
[20:00:17 CEST] <BtbN> filters can easily allocate another frames_ctx on it, that _does_ have to create frames
[20:00:25 CEST] <philipl> lovely
[20:00:42 CEST] <BtbN> the only solution I see is magic numbers, but that's absolutely not going to happen
[20:00:55 CEST] <BtbN> Or changing the API, to add creation flags
[20:00:58 CEST] <philipl> Can we have two frames_ctx for cuda?
[20:01:14 CEST] <philipl> can nvdec chose the implementation to use?
[20:01:19 CEST] <BtbN> so duplicate the whole thing, one with a dummy allocator?
[20:01:22 CEST] <philipl> yeah
[20:01:30 CEST] <BtbN> that's possible, but ugly
[20:01:40 CEST] <philipl> of course that brings us back to might-as-well-have-dummy-allocator-inside-nvdec
[20:01:49 CEST] <BtbN> that's not possible
[20:02:21 CEST] <BtbN> The HWContextType which holds the hw_frames/device_ctx function pointers is avutil internal.
[20:02:31 CEST] <philipl> sad panda.
[20:02:48 CEST] <philipl> so duplicate frames_ctx impl it is! :-)
[20:05:03 CEST] <wm4> you can make a different pixfmt, or add a frames_ctx creation parameter to control it
[20:05:25 CEST] <BtbN> I don't even have access to the frames_ctx creation, as decode.c is doing most of that
[20:05:35 CEST] <wm4> the more important question is, would it make sense to represent mapped frames as AVFrame, or is that just a hack to get the copy avoidance
[20:05:36 CEST] <BtbN> I guess I could put it into the struct
[20:05:50 CEST] <wm4> decoders can suggest creation parameters
[20:05:54 CEST] <BtbN> wm4, they are 100% identical in format. They just need to be unmapped after use.
[20:06:06 CEST] <BtbN> Which a buffer_ref in buf[0] can do
[20:06:08 CEST] <wm4> ff_nvdec_frame_params
[20:06:18 CEST] <philipl> Externally owned buffers for AVFrames are a thing elsewhere aren't they?
[20:06:29 CEST] <wm4> does not unmapping them stall decoding or something stupid?
[20:06:29 CEST] <BtbN> that's how most other hwaccels work
[20:06:45 CEST] <BtbN> Not unmapping them will kill decoding
[20:06:52 CEST] <wm4> videotoolbox in particular has the VT API allocate frames
[20:07:03 CEST] <wm4> BtbN: that seems useless for most cases then
[20:07:23 CEST] <BtbN> Why? You don't usually hoard frames
[20:07:45 CEST] <wm4> I mean you will reference a bunch of frames (usually a low number, maybe 1-4)
[20:07:56 CEST] <wm4> and unref them after display or whatever
[20:07:59 CEST] <BtbN> That's fine, as long as you don't exhaust the pool
[20:08:11 CEST] <wm4> so can you still control the pool size?
[20:08:30 CEST] <BtbN> the user can't, but yes
[20:08:58 CEST] <BtbN> it ensure like 4 + dpb_size frames to be available at all times
[20:09:20 CEST] <wm4> unless the user can control it the API isn't compatible
[20:09:30 CEST] <wm4> does nvidia document that 4?
[20:09:43 CEST] <BtbN> Why would nvidia document ffmpeg internals?
[20:09:54 CEST] <llogan> what's the minimum nasm version supported?
[20:10:04 CEST] <wm4> oh, so if it's an ffmpeg internal, the user _can_ control it
[20:10:16 CEST] <wm4> you should need to wire it up to the pool size
[20:10:39 CEST] <wm4> (initial_pool_size)
[20:10:58 CEST] <BtbN> oh indeed
[20:11:04 CEST] <BtbN> avctx->extra_hw_frames already exists for exactly that
[20:11:23 CEST] <BtbN> line 1234 in decode.c
[20:11:53 CEST] <wm4> that's orthogonal but yes
[20:12:18 CEST] <BtbN> no, the initial_pool_size is exactly the count of maximum wrapped frames at a time
[20:12:57 CEST] <wm4> I mean an API user can use that to control how many extra frame he gets when allocating the hwctx manually
[20:13:25 CEST] <BtbN> no, he can use that for the internal hw_frames_ctx just fine
[20:13:31 CEST] <wm4> mpv allocates the hwcontext manually with decoder suggested parameters, only to "cache" hwframe pools across seeks and to add additional frames
[20:13:41 CEST] <BtbN> Users can't even set the hw_frames_ctx of decoder themselves
[20:14:05 CEST] <wm4> user as in API user
[20:14:10 CEST] <BtbN> yes, those users
[20:14:18 CEST] <wm4> uh yes, they can
[20:14:20 CEST] <BtbN> there's a hard alloc for it in the code somewhere
[20:16:02 CEST] <BtbN> I hope AVHWFramesContext size if not part of the ABI?
[20:16:16 CEST] <wm4> no
[20:16:22 CEST] <wm4> it's not
[20:16:44 CEST] <BtbN> good. So I'm just gonna add a flags fields to it, and then introduce a flag for the CUDA allocator to put it in "mapped mode, do not actually allocate"
[20:17:04 CEST] <wm4> yeah
[20:17:17 CEST] <wm4> does anything still need actually allocated frames? filters maybe?
[20:17:38 CEST] <BtbN> Yes, anything but the decoder keeps working as normal
[20:17:52 CEST] <BtbN> hw_frames_ctx are not safe for re-use anyway
[20:18:06 CEST] <BtbN> every producer of frames has to allocate their own one
[20:19:02 CEST] <wm4> what do you mean by this?
[20:19:29 CEST] <BtbN> A filter can't re-use the hw_frames_ctx of the source of its frames
[20:19:52 CEST] <wm4> well it could, if the parameters are the same
[20:19:58 CEST] <wm4> but in practice won't
[20:20:02 CEST] <BtbN> There is no way for it to have a guarantee they won't change
[20:20:34 CEST] <BtbN> There was some discussion about this a while ago, and the conclusion was that it's not safe to use a hw_frames_ctx if you're not the one who owns/controls it
[21:26:59 CEST] <BtbN> philipl, wm4 yeah, this is working fine now. And it brings nvdec and cuvid to the same speed.
[21:27:41 CEST] <philipl> nice.
[21:27:55 CEST] <philipl> You could add this to cuvid too right?
[21:28:15 CEST] <BtbN> cuvid doesn't have that extra copy in the first place
[21:28:36 CEST] <philipl> But cuvid does an immediate copy of the mapped frame too doesn't it?
[21:29:41 CEST] <philipl> That is what I see in the code.
[21:30:55 CEST] <BtbN> I'm only testing sw download performance
[21:31:40 CEST] <philipl> Ah. So what about full hardware pipelines?
[21:31:56 CEST] <philipl> Presumably this is beneficial there too.
[21:36:42 CEST] <BtbN> no differnce at all
[21:37:57 CEST] <philipl> no measurable difference, you mean?
[21:45:36 CEST] <BtbN> yeah, not a single fps even, at 350 fps
[21:46:41 CEST] <philipl> Well, can't complain about that.
[21:47:01 CEST] <philipl> Are you making this nvdec change conditional on sw download or always on?
[21:52:10 CEST] <BtbN> nvdec has no idea about swdownload or not
[21:52:13 CEST] <BtbN> that's the whole problem
[21:52:22 CEST] <philipl> Right, right. I remember now.
[21:52:42 CEST] <BtbN> -hwaccel nvdec -hwaccel_output_format cuda is subtly broken though
[21:52:55 CEST] <BtbN> there's one macroblock of broken U/V data at the top
[21:53:00 CEST] <philipl> weird.
[21:53:34 CEST] <philipl> not an offset problem? the rest of the u/v is correct?
[21:53:41 CEST] <BtbN> yep, rest seems correct
[21:53:50 CEST] <BtbN> and the copy to sw path works, with the same offset
[21:54:47 CEST] <BtbN> https://btbn.de/public/scrn/1525722858.png I mean, it's pretty obvious that the line belongs to the bottom of the screen
[21:55:49 CEST] <BtbN> I'm mildly confused as to why this only happens in the no-sw-copy path
[21:55:58 CEST] <BtbN> but copying the same pointers to sw is just fine??
[21:56:33 CEST] <BtbN> Unless cuvid is unhappy about the frames staying used for longer, while in the swdownload case they are pretty much freed immediately
[21:57:42 CEST] <philipl> Huh. Weird.
[21:58:45 CEST] <philipl> It shouldn't be unhappy if we believe their docs. The frame is mapped and there are more in the pool. Should be safe.
[21:59:16 CEST] <philipl> You could try dumping some bytes early and late in the pipeline and see if they actuall change.
[21:59:23 CEST] <BtbN> I'm gonna add trace logs to the map/unmap functions which logs the idx
[22:02:59 CEST] <BtbN> hm, it's obviously more all over the place, but isn't even close to running out.
[22:15:13 CEST] <BtbN> ok, this is weird. Even if I hilariously mess up the UV plane intentionally, the wrong bar stays in place
[22:15:24 CEST] <BtbN> and the image below looks as messed up as you'd expect
[22:16:16 CEST] <BtbN> https://btbn.de/public/scrn/1525724157.png like this
[22:18:25 CEST] <BtbN> almost like the Y plane is wrong. But on on earth would it be?
[22:19:03 CEST] <atomnuker> driver issues? there's still 1-2 frame chroma lag on my machine
[22:20:03 CEST] <BtbN> Why would this only happen in this constellation?
[22:20:15 CEST] <BtbN> also, I need a better test pattern tham Game of Thrones
[22:21:49 CEST] <BtbN> ok _now_ things are getting weird. I added complete bullshit to the data pointer
[22:21:52 CEST] <BtbN> the image is unchanged
[22:22:09 CEST] <BtbN> there must be some rarely used internal magic nvdec->nvdec kicking in, being broken
[22:22:15 CEST] <BtbN> *nvdec->nvenc
[22:23:52 CEST] <BtbN> oh I have a suspicion what's going on
[22:31:33 CEST] <BtbN> Yeah, found it. Now to find out how to fix it
[22:36:31 CEST] <cone-799> ffmpeg 03Zhong Li 07master:06344f705e66: lavc/qsvenc: set corret maximum value of look_ahead_downsampling
[22:36:31 CEST] <cone-799> ffmpeg 03Haihao Xiang 07master:65be65da37eb: cbs_h264: Need [] in the name when subscript is required
[22:36:31 CEST] <cone-799> ffmpeg 03Haihao Xiang 07master:1b0e0578c2ed: vaapi_encode_vp8: memset the the structure to 0
[22:37:54 CEST] <BtbN> Can avctx->coded_width ever be unset?
[22:38:12 CEST] <BtbN> in an encoder?
[22:38:16 CEST] <BtbN> It somehow is inaccurate.
[22:38:59 CEST] <BtbN> Anyway, I never intended to push the patch that broke it. So reverting it it is.
[22:39:40 CEST] <jkqxz> "encoding: unused". So yeah, you shouldn't use it at all.
[22:46:35 CEST] <BtbN> Yeah, it's a simple revert of the patch that broke it.
[23:18:18 CEST] <philipl_> BtbN: what was the problem?
[23:18:49 CEST] <BtbN> I had that patch to make hw_frames_ctx in nvenc in one of my last batches of patches, but I just forgot it in there, and I never tested it.
[23:18:55 CEST] <philipl_> ah
[23:19:09 CEST] <BtbN> And it uses the actual frame width/height to register the hwframe
[23:19:26 CEST] <BtbN> instead of the width/height from the hw_frames_ctx, which is the coded_width/height
[23:21:54 CEST] <BtbN> next on is a battle with configure
[23:22:04 CEST] <BtbN> Because I have complex version requirements to ffnvcodec
[23:24:58 CEST] <thardin> whew, that's a lot of mxf patches
[23:30:15 CEST] <ubitux> durandal_1707: found a way to help compilers, making nlmeans slightly faster again
[23:30:22 CEST] <ubitux> you may want to try nlmeans-dsp on my github
[23:35:40 CEST] <BtbN> jkqxz, hm, "May be set by the user before calling av_hwframe_ctx_init().", or "To be set by the user..." or just "Can be set..."
[23:35:48 CEST] <durandal_1707> ubitux: what you did this time?
[23:35:54 CEST] <BtbN> no idea what most clearly describes its intend
[23:36:06 CEST] <ubitux> inline the patch value computation
[23:36:08 CEST] <BtbN> also, in this case, it's not even true, theoretically this flag can be changed at any time
[23:36:23 CEST] <ubitux> so that it doesn't compute the position in the inner loop
[23:36:31 CEST] <ubitux> also, i just experimented a branchless version
[23:36:34 CEST] <ubitux> and it looks faster
[23:36:46 CEST] <ubitux> i'll commit in a moment
[23:41:24 CEST] <ubitux> that one needs some thinking, i'll do it tomorrow
[23:41:35 CEST] <ubitux> but current state of the branch should be generally faster
[23:51:28 CEST] <ubitux> yeah nah, branchless isn't faster
[00:00:00 CEST] --- Tue May 8 2018
More information about the Ffmpeg-devel-irc
mailing list