[Ffmpeg-devel-irc] ffmpeg-devel.log.20190324

Mon Mar 25 03:05:04 EET 2019

[10:37:27 CET] <cone-319> ffmpeg 03Michael Niedermayer 07master:013f71497ba5: avcodec/tiff: do not allow bpp 40 with undefined pixel formats
[15:15:29 CET] <BtbN> Vulkan extension for video de/encode. Now that sounds interesting.
[15:16:18 CET] <BtbN> Might be just the thing specially Linux needs to clean up the mess a bit.
[15:26:13 CET] Action: bencoh mutters something about openmax
[15:29:10 CET] <BtbN> Except no normal GPU driver implements that, while they all implement Vulkan
[15:50:52 CET] <philipl> BtbN: I'm sure they'd find a way to implement different video extensions which are all part of the spec, just like the memory sharing ones...
[15:52:20 CET] <philipl> jkqxz, BtbN: Is that crop_cuda the right way to do the cropping? I saw jkqxz posted his change to the crop filter to propagate the crop values - should we be doing cuda cropping in the scale filter too?
[15:52:39 CET] <philipl> and save a copy
[15:58:11 CET] <BtbN> Thing is, with the "save a copy" argument you would end up with one huge catch-all filter
[16:09:05 CET] <BtbN> And if you put it into the scale filter, won't it have to internally copy either way?
[16:16:59 CET] <tmm1> can anyone see a problem with this approach: https://gist.github.com/tmm1/002096d61886fb7f58830455870104ad
[16:26:18 CET] <jkqxz> For VAAPI, it's straightforward to put it in every filter so you don't have to use scale (right now only scale has it, but it can easily be pushed in the generic VAAPI filter code).  It can't be set on an encoder, though.
[16:27:09 CET] <jkqxz> Other APIs are all a bit uglier though.  OpenCL would be trivial if it had subimages, but it doesn't so you would need a bit of code to support it in every kernel.
[16:28:52 CET] <JEEB> &33
[16:29:13 CET] <jkqxz> I was thinking of adding a hwframe transfer flag to indicate whether cropping is desired.  It would be easy on some APIs and impossible on others, though, so the exact semantics of how it should behave are unclear.
[16:29:46 CET] <jkqxz> (And add an option to hwdownload (maybe set by default?) to control it.)
[16:55:39 CET] <BtbN> jkqxz, the problem with CUDA are the strict alignment requirements
[16:55:53 CET] <BtbN> cropping top/right/bottom works without copy, but left...
[17:04:16 CET] <jkqxz> Isn't the cuda code itself (in the scale filter) just treating them as arrays of bytes, though?
[17:05:43 CET] <BtbN> They are arrays of bytes
[17:05:57 CET] <BtbN> but each line has to be at least 256 byte aligned
[17:06:39 CET] <BtbN> so you could crop the left side. In increments of 256
[17:07:09 CET] <BtbN> Or 512 even, depending on format
[17:07:24 CET] <jkqxz> But doesn't that mean that your scale filter just ignores however many bytes on the left edge to apply the cropping?  Or is there some additional constraint on how that code works which I'm not aware of?
[17:07:46 CET] <BtbN> The scale filter doesn't do cropping. It's a dedicated crop filter that's on the ML.
[17:08:25 CET] <jkqxz> Generic crop followed by something like scale, I mean.
[17:08:54 CET] <jkqxz> And supporting cropping from the decoder, too.
[17:09:07 CET] <BtbN> cuvid supports cropping. nvdec can't.
[17:09:35 CET] <jkqxz> NVDEC correctly sets the crop fields, so it anything read them it would work.  (Like scale_vaapi does.)
[17:09:52 CET] <BtbN> And when you crop in a generic fashion, it'll fail to encode due to poor alignment
[17:10:10 CET] <BtbN> unless you happen to hit proper alignment with your crop
[17:11:55 CET] <jkqxz> The encoder just ignores the cropping information, yeah.
[17:12:42 CET] <BtbN> I don't follow. The Cuda Pointers are pretty much normal pointers. And they need to be properly aligned. So if you do naive cropping without copying to another aligned buffer, it will fail at some point.
[17:12:46 CET] <jkqxz> Not sure any hardware encoder supports that, so you always have to put something in front of the encoder to deal with the cropping.
[17:13:43 CET] <jkqxz> Sorry, maybe that wasn't clear.  By generic cropping I mean setting the AVFrame cropping fields (like the decoder does).  Other filters can make use of them (like scale_vaapi does).
[17:14:08 CET] <jkqxz> It doesn't change any pointers, so the filters need to have enough awareness to apply offsets themselves.
[17:14:55 CET] <BtbN> Hm, so that'd mean rejecting the scala_cuda filter, and implement its functionality into the scale_cuda filter instead?
[17:15:02 CET] <BtbN> *rejecting the crop_cuda filter
[17:16:26 CET] <jkqxz> Yes, that's my thought.
[17:18:53 CET] <BtbN> The problem is, we'd then end up with pretty much only one everything_cuda filter, to avoid copies
[17:19:05 CET] <jkqxz> Maybe we need another comparison point.  How will cropping work with vulkan, say?
[17:19:38 CET] <BtbN> Cause the argument of avoiding a copy works with pretty much every CUDA filter.
[17:20:20 CET] <jkqxz> What other components are you thinking everything has there?  The crop filter only works in this case because we already support it from the decoder and have the AVFrame fields to do so.
[17:20:47 CET] <BtbN> Pretty much every filter implemented in CUDA
[17:21:15 CET] <BtbN> if you implement tham as individual avfilter filters, most of them will need a copy or call make_writable
[17:21:30 CET] <BtbN> If you stuff all of them into one filter into a common kernel, you can do all of them with only one single copy
[17:24:30 CET] <jkqxz> Give a specific example?  I'm not sure I see any other separate filters we could deal with like this.  (Colour conversion can since the AVFrame has those fields, but scale already handles that in the software case too.)
[17:25:07 CET] <BtbN> transpose, scale, crop, color correction, ... literally anything you can write into a CUDA Kernel
[17:28:48 CET] <jkqxz> I don't see.  How do you implement transpose in this form?  The generic transpose filter on software frames does a copy, and there isn't any pointer trickery or "also transpose this" field you can mark to indicate that a transpose is needed.
[17:29:35 CET] <BtbN> You just write one big kernel which does all the things, and run it once.
[17:30:04 CET] <BtbN> Instead of chaining 5 different Kernels, with a copy in between each
[17:30:55 CET] <jkqxz> But you would then need to make your filter -vf allthethings_cuda.  Not -vf crop,transpose,scale_cuda in series, which is what the proposal is.
[17:31:10 CET] <BtbN> Which is exactly what I'm saying the issue is
[17:32:32 CET] <BtbN> Putting all the functionality into one big kernel will vastly outperform the other approach, but it's clearly very ugly
[17:35:25 CET] <jkqxz> I don't think it follows that you want to do that.  My slippery slope has a fence on it at the fields already existing which should be supported anyway.
[17:35:37 CET] <jkqxz> (Such as the cropping fields in AVFrame.)
[17:45:20 CET] <philipl> BtbN: it seems like it would be reasonable to apply crop in the scale filter. It would not double the amount of code in there.
[18:04:03 CET] <BtbN> jkqxz, it's entirely unrelated to any fields anywhere. This applies to pretty much each and every cuda based filter.
[18:04:17 CET] <BtbN> philipl, yeah, crop can go in there. And transpose and rotate probably as well.
[19:10:10 CET] <philipl> BtbN: I can't tell if you're being sarcastic when you say that :-P
[19:10:57 CET] <BtbN> No, I'm fine with a "all the video transformations" filter
[19:11:47 CET] <BtbN> combining scale and crop should be simple enough
[19:12:32 CET] <philipl> Ok. So that's a plan at least. I wonder if the crop filter guy is willing to do the work.
[19:12:54 CET] <philipl> Do you know who these people are? Looks like a chinese company.
[19:13:20 CET] <BtbN> No idea, I thought it was an nvidia guy first, some of them use their gmail address to avoid the company footer, but it doesn't seem to be.
[19:13:40 CET] <BtbN> Pretty sure it's just one person, sometimes using an pseudonym, sometimes the right now?
[19:13:59 CET] <philipl> I thought I saw one author and two other guys showing up for the first time doing partial reviews.
[19:14:14 CET] <jkqxz> You are thinking transpose/rotate by making the generic one set displaymatrix which can be read by scale?
[19:14:27 CET] <philipl> Not terribly important, but always interesting when the author isn't nvidia.
[19:14:55 CET] <BtbN> Hm? I'm thinking by writing all of them into one kernel.
[19:15:06 CET] <BtbN> And activating the neccesary parts via parameters.
[19:16:59 CET] <jkqxz> Oh.  That's not what I was imagining, nor how the generic crop is working.
[19:18:00 CET] <BtbN> But that's what the whole discussion was about, putting all the functionalities into one filter, to avoid copying the frame around more than absolutely neccesary.
[19:20:55 CET] <jkqxz> Er, no?  From my point of view the intent was to put the functionalities of filters which we already have to handle in non-filter form (like cropping information) into the scale filter, and making the generic filters set those parameters for hardware frames so you can use them identically.
[19:25:41 CET] <atomnuker> the thing I'm worried about is that if they have a vulkan decoding api they'll probably jam the post-processing side of things (deinterlacing, eq) in the swapchain like they did with hdr
[19:28:36 CET] <BtbN> So you want to auto-insert the "do everything scaling/cropping" filter before the encoder? But then again, that does not change a thing about the issue at hand.
[19:28:46 CET] <BtbN> A monolithic filter would even make that easier.
[19:30:28 CET] <jkqxz> I wasn't particularly intending to auto-insert it, though I guess it could be.
[19:31:14 CET] <BtbN> With no metadata attached it's just a plain passthrough
[19:41:14 CET] <j-b> BtbN: philipl: I have a called scheduled next week with nVidia
[19:41:55 CET] <BtbN> sounds good
[19:42:26 CET] <j-b> for nvcc, yes
[19:42:32 CET] <j-b> for npp, not really
[19:42:45 CET] <j-b> but anyway, I'll try my best
[19:43:05 CET] <BtbN> The thing is, the compiler isn't even the hard part. clang can do that. It's all the libs it links against.
[19:43:19 CET] <nevcairiel> i thought it doesnt actually do that
[19:43:32 CET] <nevcairiel> just generate cuda-asm
[19:43:34 CET] <j-b> my understanding was similar to nevcairiel 
[19:43:45 CET] <BtbN> clang can replace nvcc, but still needs the SDK for headers and libs
[19:44:11 CET] <BtbN> Which makes the whole matter a whole lot more complicated.
[19:45:52 CET] <BtbN> https://developer.nvidia.com/cuda-llvm-compiler even suggests nvidia themselves uses that for nvcc now
[19:51:26 CET] <BtbN> Basically, we need _any_ way to build. .cu to .ptx that does not require accepting EULAs and registering with Nvidia.
[19:57:02 CET] <philipl> j-b: were you able to ask Bradley about the compiler thing?
[19:57:13 CET] <philipl> whether there's actually a licence issue here?
[19:59:24 CET] <BtbN> I mean, we can always implement the idea of adding the .ptx files to the tree...
[19:59:55 CET] <BtbN> But I feel like that Idee will receive general disagreement when sending it for review
[20:01:51 CET] <philipl> I don't see what licensing problem that fixes. If you consider the ptx files to still be free then what is checking them in solving? Sounds like the argument is that a EULA encumbered compiler is inconvenient. Yes it is, but that is not licensing.
[20:03:52 CET] <philipl> BtbN: and you are right about the compiler. the actual compiler binary is probably free or almost free if it's not identical to clang, but you need to give it sdk components to actually do the compilation to resolve symbols and so on.
[21:53:17 CET] <jkqxz> j-b:  I don't know about the others, but I'm not sure the "glorified ioctl" comment is really true for Intel.  On Linux at least, the userland parts (libmfx / any of the VAAPI drivers) are pretty huge.
[21:53:40 CET] <jkqxz> The ioctl() interface being used is pretty much entirely "enqueue this operation to some command ring", where that operation is something pretty complex constructed in userland.
[21:53:41 CET] <j-b> VAAPI drivers are drivers
[21:54:03 CET] <j-b> and are part of "major components of the OS
[21:54:41 CET] <jkqxz> So the kernel-land comment was meaning something more than the kernelspace/userspace distinction?  I guess that's fair.
[21:54:57 CET] <j-b> the issue is that the GPL is unclear
[21:55:04 CET] <j-b> and was written in 1991
[21:55:29 CET] <j-b> >  However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. 
[21:55:35 CET] <j-b> This is what it says
[21:55:45 CET] <j-b> It does not speak about libraries, at all.
[21:55:56 CET] <j-b> It speaks about 'major components'
[21:56:15 CET] <j-b> and it speaks about 'so on'
[21:57:01 CET] <j-b> It mentions compiler, which is what philipl was talking about, but it was meant as C compiler, at that time.
[21:59:42 CET] <jkqxz> A Linux distribution is inconveniently huge nowadays - you could produce a distribution which "normally distributes" basically any binary thing you want (NDIux, say).  Reading that as kernelspace/userspace would at least apply a clean delineation.
[22:00:21 CET] <j-b> that is why it says "major components"
[22:00:26 CET] <j-b> not "any components"
[22:02:19 CET] <jkqxz> Well, libNDI is a critical and required component of NDIux.
[22:05:07 CET] <j-b> Which is not an OS, and not a way to access hardware
[22:27:38 CET] <cone-081> ffmpeg 03James Almer 07master:699d0c2a30d5: avcodec/av1_parser: don't abort parsing the first frame if extradata parsing fails
[23:23:11 CET] <BBB> this is honestly too much legalese for me. As I've said in my email, my concern is philosophical as much as it is legal
[23:23:16 CET] <BBB> the legal case is murky at best
[23:23:25 CET] <BBB> the philosophical case is just outright dubious
[23:23:26 CET] <BBB> imo
[00:00:00 CET] --- Mon Mar 25 2019