[Ffmpeg-devel-irc] ffmpeg-devel.log.20190731

Fri Sep 13 14:29:11 EEST 2019

[05:41:53 CEST] <tmm1> started adding a/53 cc sei to h264_vaapi but not working yet: https://gist.github.com/tmm1/cf7d047f8f2ad591f865647cc7d4921c
[08:39:48 CEST] <rcombs> BtbN: philipl: so, I'm looking at how to do stuff currently done in OpenCL in CUDA, and I see two plausible routes
[08:40:14 CEST] <rcombs> either convert the CL to CUDA in a preprocessor pass similar to this: https://github.com/Guilhermeslucas/OpenCL2CUDA/blob/master/createCUDAapp.py
[08:41:35 CEST] <rcombs> or make some changes to the CL code that use macros to abstract away the differences (tbh the biggest thing is probably vector literals), and then build the same code under both
[08:44:08 CEST] <rcombs> like, #define READ_TEX1(src, x, y) read_imagef(src, sampler, (int2)(x, y)).x in CL and tex2D<float>(src, x, y) in CUDA
[08:46:36 CEST] <rcombs> (the obvious third option is to just duplicate all the code in CUDA but I feel like these languages are similar enough that it's worth avoiding duplication if we can)
[10:14:53 CEST] <black> Hello ! Does somebody know where the (h264)parser is called in ffmpeg.c ?
[10:21:46 CEST] <black> And if it is possible to modify ffmpeg.c's pipeline so that bistream filters could be applied before decoding ?
[10:27:56 CEST] <JEEB> it is possible of course
[10:28:24 CEST] <JEEB> parsers are called from demuxer usually
[10:28:49 CEST] <JEEB> so when you get AVPackets from an AVFormatContext you usually get them pre-parsed
[10:29:07 CEST] <JEEB> parsing mostly meaning stuff like parsing NAL units from a byte buffer f.ex. :P
[10:30:08 CEST] <JEEB> you'd have to make bsf as an option go for inputs as well, and then add the code to add the bit stream filtering on the correct side
[10:36:32 CEST] <JEEB> although depending on your use case you could do the same thing as what we do with the HDR SEIs f.ex.
[10:36:41 CEST] <JEEB> you set side data to the AVPackets and finally AVFrames
[10:36:48 CEST] <JEEB> and apply those wherever you need
[10:37:04 CEST] <JEEB> also please do not privately message me
[10:39:10 CEST] <black> The issue I am facing is that all the parameters I need to apply a bsf filter are not accessible at the beginning of the pipeline. But I am interested in what you do with HDR SEI. Is there already some code implemented ?
[10:40:22 CEST] <black> Getting SEI informations to AVPacket to AVFrame is exactly what I want to do.
[10:40:43 CEST] <JEEB> see AV_PKT_DATA_CONTENT_LIGHT_LEVEL
[10:40:50 CEST] <JEEB> as an example
[10:42:08 CEST] <JEEB> and AV_FRAME_DATA_CONTENT_LIGHT_LEVEL
[10:42:40 CEST] <JEEB> then for example the colorspace filter is reading that AVFrame side data
[10:50:54 CEST] <black> I am sorry but I am not able to find where it reads SEI NAL unit and adds new packet side data
[10:51:34 CEST] <JEEB> most likely inside libavcodec in cbs or so
[10:51:43 CEST] <JEEB> I tried to find it but it's all pretty templated
[11:03:35 CEST] <black> What did you mean by getting pre parsed NAL units in AVFormatContext ?
[11:09:35 CEST] <JEEB> that's the more higher-level parsing stuff which is used with containers that don't have packets already pre-available for you one after naother
[11:09:39 CEST] <JEEB> like MPEG-TS
[11:12:01 CEST] <durandal_1707> i profiled ffmpeg with Very Sleepy CS and it speend most of time in WaitForSingleObject, arround 50% thats where 2x slowdown comes
[11:15:02 CEST] <nevcairiel> ie. threading overhead
[11:15:51 CEST] <vel0city> I don't think 50% would come from overhead, sounds like ineffiecient multithreading
[11:16:18 CEST] <nevcairiel> thats just another way to describe too much overhead
[11:16:20 CEST] <vel0city> some threads finishing much earlier than all of them
[11:16:31 CEST] <durandal_1707> yea, looks like
[11:16:51 CEST] <vel0city> I don't think that's what overhead means but whatever, semantics
[11:17:09 CEST] <nevcairiel> also that wouldnt explain actually slowing down compared to single threading
[11:17:51 CEST] <durandal_1707> shouldn't it wait more smartly, by not consuming CPU like crazy
[11:18:02 CEST] <vel0city> @nevcairiel: oh, weird
[11:18:15 CEST] <nevcairiel> WaitForSingleObject is usually a non-busy wait
[11:18:39 CEST] <nevcairiel> waits for the signal to be set
[11:20:03 CEST] <vel0city> @durandal_1707: very sleepy CS isn't very useful for profiling multithreaded issues btw
[11:20:03 CEST] <durandal_1707> when inspecting how are workers getting spread accross CPUs, i noticed that sometimes it get 0 0 0 0 1 2 instead of 0 0 1 1 2 2 spread
[11:20:31 CEST] <durandal_1707> vel0city: that was simplest thing to setup
[11:20:45 CEST] <vel0city> yeah that's the upside
[11:21:06 CEST] <durandal_1707> eg. there are 3 threads and 6 jobs -- channels
[11:23:00 CEST] <vel0city> is that left up to the OS or does ffmpeg configure if via affinity?
[11:26:55 CEST] <vel0city> I don't see a SetThreadAffinityMask or pthread_setaffinity_np anywhere 
[11:27:53 CEST] <durandal_1707> dunno, http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavutil/slicethread.c;h=dfbe551ef2062e87286d0d37abcb12e5b975f318;hb=HEAD#l53
[11:28:32 CEST] <vel0city> yea seems like it's left up to the OS, which would explain it being inconsistent
[11:28:38 CEST] <BtbN> rcombs, if we can just copy that converter-script to the ffmpeg compat tree, I think I'd actually prefer that.
[11:29:12 CEST] <vel0city> I was part of a projects that micromanaged these affinities, if you do it correctly you can get lots of speedup (depending on the application of course)
[11:29:20 CEST] <nevcairiel> affinity problems should perhaps result in not-ideal performance, not a 50% slowdown
[11:29:55 CEST] <durandal_1707> the dsd2pcm code basically does convolution, with floats
[11:29:55 CEST] <vel0city> does this issue only happen on Windows? it's been known to have a worse scheduler overally than linux
[11:30:17 CEST] <vel0city> wouldn't explain it being 50% worse but I'm still curious
[11:30:32 CEST] <vel0city> overall*
[11:31:48 CEST] <nevcairiel> also slower on linux
[11:32:18 CEST] <nevcairiel> carl tested it yesterday iirc, and his numbers were quite similar
[11:32:22 CEST] <durandal_1707> on 4 cpus with hyperthreading it was also 50% slower
[11:33:50 CEST] <jkqxz> Have you fixed the CPU frequency?  If you end up doing a little bit of stuff on each core sporadically then the dynamic frequency scaling stuff may screw you over compared to just using one core all the time.
[11:34:06 CEST] <rcombs> BtbN: there are definitely features that ffmpeg uses that that script doesn't handle
[11:34:16 CEST] <BtbN> hm
[11:34:47 CEST] <BtbN> How much of a mess would a CL/CUDA compat header be? And what if someone comes up with a new filter that uses significantly more features?
[11:35:16 CEST] <BtbN> Specially as the majority of CL authors probably don't have nvidia hardware
[11:35:16 CEST] <rcombs> worst-case, that filter would only support one or the other ¯\_(Ä)_/¯
[11:35:24 CEST] <rcombs> no worse than now
[11:35:37 CEST] <rcombs> like, it wouldn't preclude writing CL-only code
[11:36:14 CEST] <vel0city> @durandal_1707: you could try VTune 
[11:36:43 CEST] <rcombs> I don't think the header would be too bad, but I haven't tried to do it yet, so it's hard to be sure
[11:37:26 CEST] <vel0city> I think you'd need to build via VS though
[11:38:45 CEST] <rcombs> durandal_1707: have you profiled yet
[11:39:19 CEST] <rcombs> oh, scrolled up, I see you did
[11:39:46 CEST] <nevcairiel> well, benchmarking MT issues with such simple tools gives you hard to interpret data
[11:39:47 CEST] <BtbN> rcombs, I got my Nvidia-Box revived, via external ATX power supply. So I'll be able to test stuff once I got it up to date again.
[11:39:56 CEST] <rcombs> it's pretty common for short condition waits to spin
[11:40:43 CEST] <nevcairiel> perfect MT profiling probably needs instrumentation
[11:40:46 CEST] <rcombs> I'm not sure if profiling tools count context switch time towards time spent in a function
[11:42:34 CEST] <BtbN> rcombs, are you planning on submitting a version of the clang patch with different configure behaviour, or should I change it and post the new version to the ML?
[11:44:37 CEST] <durandal_1707> i downloaded VS 2019 on Windows  7, is that sane? perhaps i need older version?
[11:44:42 CEST] <rcombs> BtbN: which different behavior? like, making it autodetect?
[11:46:15 CEST] <BtbN> rcombs, auto-detect, and nvcc non-nonfree
[11:46:43 CEST] <rcombs> non-nonfree feels like a separate commit
[11:59:05 CEST] <durandal_1707> how i set path libs and etc for MSVC inside mingw32 tty?
[12:01:16 CEST] <BtbN> iirc it also takes -L
[12:02:01 CEST] <BtbN> rcombs, fair. But imo clang should be autodetected. And that also implies a configure-time feature-check if the clang binary even can produce nvptx.
[12:02:23 CEST] <rcombs> there's already one of those
[12:03:31 CEST] <nevcairiel> the best way to build with MSVC is to start a MSVC build prompt (it should've made shortcuts for you for that in the start menu, I think), which will setup the environment so that MSVC works, and from that windows shell then run mingw bash, so it inherits the fully setup MSVC environment
[12:03:35 CEST] <BtbN> oh yeah, check_nvcc should cover that
[13:39:26 CEST] <durandal_1707> what this means: Unable to obtain debug information. Link with the /PROFILE linker switch.
[13:41:17 CEST] <kierank> durandal_1707: why don't you install basic Linux distro
[13:41:35 CEST] <kierank> Or sign up to free Amazon trial for server
[13:42:24 CEST] <durandal_1707> i can not, sir, please help me!
[14:02:51 CEST] <durandal_1707> i figured it!
[14:15:08 CEST] <nevcairiel> or get windows 10 and WSL as a last resort =p
[14:56:11 CEST] <Lynne> who at intel thought it would be a good idea to have the src of vpermps accept memory and not the mask?
[14:59:00 CEST] <Lynne> and yet vpermilps accepts memory as mask
[14:59:17 CEST] <jamrial> one mask for a long array of source bytes in memory is probably the most common usage for it
[14:59:44 CEST] <J_Darnley> too few bits to allow memory for both?  created for a specific/narrow purpose?
[15:12:10 CEST] <durandal_1707> http://0x0.st/zfwW.png
[15:13:41 CEST] <nevcairiel> if its spending so much time sleeping, maybe the threading isnt actually working, and you  need to somehow allow it to process in  parallel
[16:24:20 CEST] <Lynne> durandal_1707: do you think av_tx could be used in showcqt?
[16:25:28 CEST] <durandal_1707> Lynne: yes, with SIMD, why not?
[16:26:06 CEST] <Lynne> yes, unchanged, but I'm asking if it can be optimized more if it used a non ptwo transform
[16:26:32 CEST] <Lynne> since cqt_len is resolution dependent which is usually 1920, 1280, etc all of which can be done by av_tx
[16:26:41 CEST] <durandal_1707> ask original author
[16:27:11 CEST] <durandal_1707> i'm not much into its log transform
[16:56:25 CEST] <Lynne> looked into it, would be able to optimize non-default timeclamp values
[16:56:54 CEST] <Lynne> currently it uses the nearest bigger power of two transform sized samplerate * timeclamp
[16:57:39 CEST] <Lynne> so for 48000 * 0.09 = ~4320, but has to use 8192, doing a lot more work
[17:00:30 CEST] <Lynne> but won't be any faster without simd
[18:27:46 CEST] <jamrial> have we ever changed what kind of extradata a given codec propagates between lavc and lavf before?
[18:28:04 CEST] <jamrial> would it require anything other than just a major bump? is it considered an api change?
[18:30:36 CEST] <Lynne> some game codecs if at all, I don't think anything else
[19:29:44 CEST] <durandal11707> simple av_usleep() shows that issue is not in slice threading implementation
[20:07:37 CEST] <mkver> jamrial: I don't know if someone else alreay mentioned it, but your patch 7 also contains a typo: "writting"
[20:08:12 CEST] <jamrial> mkver: thanks, changed locally
[20:11:53 CEST] <durandal11707> could slowdown be caused by hyperthreading?
[20:38:22 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:da5039415c2b: avformat/mpc: deallocate frames array on errors
[20:38:22 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:267eb2ab7f87: avcodec/apedec: Fix multiple integer overflows and undefined behaviorin filter_3800()
[20:38:22 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:bf778af1493b: avcodec/apedec: make left/right unsigned to avoid undefined behavior
[20:38:22 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:1aad8937f73f: avcodec/apedec: Make coeffsA/B uint32_t, this avoids several cases of undefined behavior
[20:38:22 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:9a353ea87662: avcodec/truemotion2: Fix several integer overflows in tm2_motion_block()
[20:38:22 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:f31ed8f3b00e: avcodec/vc1_block: Fix integer overflow in ff_vc1_pred_dc()
[20:38:22 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:6dfda35dd29d: avcodec/vc1_pred: Fix invalid shift in scaleforsame()
[20:38:23 CEST] <cone-180> ffmpeg 03Michael Niedermayer 07master:42a2edcc1d77: tools/target_dec_fuzzer: fix memleak of extradata