[Ffmpeg-devel-irc] ffmpeg-devel.log.20170826

Sun Aug 27 03:05:03 EEST 2017

[11:32:39 CEST] <cone-766> ffmpeg 03Paul B Mahol 07master:9d6aab6fa160: avfilter/af_surround: make volume configurable for front center and lfe channel
[11:43:39 CEST] <cone-766> ffmpeg 03Carl Eugen Hoyos 07master:9d494c5e553f: lavf/rawenc: Add little- and big-endian G.726 muxers.
[12:00:28 CEST] <cone-766> ffmpeg 03Carl Eugen Hoyos 07master:1c56becb9b04: lavc/utils: Calculate frame duration for little-endian G.726.
[12:50:35 CEST] <cone-766> ffmpeg 03Carl Eugen Hoyos 07master:094d4d8691ab: lavc/sinewin_tablegen: Fix compilation with --enable-hardcoded-tables.
[15:19:59 CEST] <cone-766> ffmpeg 03Paul B Mahol 07master:473e18fdbabb: doc/filters: improve pseudocolor example
[16:59:13 CEST] <BBB> j-b: ping
[18:10:55 CEST] <durandal_1707> Compn: why not announce that mplayer is officially dead
[18:12:20 CEST] <JEEB> it's a hobby project, it will never die
[18:12:27 CEST] <JEEB> as long as it's someone's hobby
[18:13:03 CEST] <Compn> it still works
[18:13:29 CEST] <Compn> and this way it never gets more bloated, unlike vlc :)
[18:13:31 CEST] <durandal_1707> it doesnt it stinks
[18:13:40 CEST] <Compn> what problems you having now ? :D
[18:14:21 CEST] <durandal_1707> Compn: problems is people still use mplayer crap
[18:14:57 CEST] <Compn> hmm yes
[18:15:01 CEST] <Compn> i dont want new users using it
[21:26:24 CEST] <cone-247> ffmpeg 03Paul B Mahol 07master:15e9c4afdc8e: avfilter/af_amix: switch to activate
[21:26:24 CEST] <cone-247> ffmpeg 03Paul B Mahol 07master:7f5c655833c5: avfilter/af_amix: simplify const entries for duration in amix_options[]
[22:08:11 CEST] <graphitemaster> How exactly do things like mpv/mplayer handle software only decoding of videos so well with the libavcodec, yet when I try and do all the same things (avoid software scaling/filter, multi thread processing of frames, etc) performance is absymal. I don't understand how one uses libavcodec well enough to get good performance out of it. We can only really use the software side of it since we're manipulating the raw YCbCr stuff, 
[22:08:11 CEST] <graphitemaster> and using that as a texture in a 3D scene, plus the hwaccel stuff is very much a complete utter mess of conflated choices about OS/hardware
[22:10:55 CEST] <graphitemaster> I can launch four instance of mpv with --hwdec=no playing 4k h.264 videos and my CPU utilization is like 20%, yet I do the same thing with my code I'm hitting 400% cpu usage and it's literally all spent in avcodec_decode_video2
[22:28:25 CEST] <JEEB> graphitemaster: did you happen to do something stupid like disabling yasm?
[22:28:57 CEST] <JEEB> graphitemaster: unless you're doing something really dumb you shouldn't be much slower than `ffmpeg -i stuff -f null -`
[22:29:43 CEST] <graphitemaster> JEEB, we build ffmpeg from source on Linux (for which we use the default configure flags), on Windows we use Zerano's builds (which presumably are optimized)
[22:29:49 CEST] <graphitemaster> both are slow
[22:30:11 CEST] <graphitemaster> yasm is installed on my development rig, so I would assume ffmpeg configure would use it
[22:30:16 CEST] <JEEB> it should
[22:30:56 CEST] <JEEB> test with the ffmpeg command line app, and compare speeds. I have written some really dumb code and I'm seeing similar performance
[22:33:42 CEST] <graphitemaster> yeah, the command line is running faster :(
[22:37:05 CEST] <graphitemaster> am I just using avcodec wrong, I search for suitable decoders for all streams with avcodec_find_decoder, copy the codec context, and call avcodec_open2, setup some temporary scratch storage space for frames for each stream and on a separate thread I consume frames with avcodec_read_frame, shoving them into a buffer, which is then processed +- some syncronization code and switches on the frame types and does what needs to be 
[22:37:05 CEST] <graphitemaster> done
[22:38:09 CEST] <graphitemaster> and all the time is spent in avcodec_decode_video2 (well a good 80 to 90% is for 4k video)
[22:39:06 CEST] <graphitemaster> I do sit in a while loop for when the packet size is > 0, doign decode_video2, subtracting off that length since that's a thing *shrug*
[22:39:14 CEST] <graphitemaster> and yeah if the length < 0 I just break from that loop
[22:39:20 CEST] <graphitemaster> so no weird contention there
[22:39:38 CEST] <JEEB> I recommend you take a look at the decoding examples and utilize the new API with reference counting enabled
[22:40:07 CEST] <graphitemaster> this would be the _third_ time I've had to rewrite our code because yall apis keep changing and deprecating :P
[22:40:08 CEST] <JEEB> but in general if you are doing just decoding, it shouldn't be much slower than ffmpeg.c
[22:40:18 CEST] <wm4> graphitemaster: what frame scratch space?
[22:40:26 CEST] <JEEB> the push/pull thing is not exactly new
[22:40:47 CEST] <wm4> also copying codec contexts hints you're using another deprecated APIs
[22:40:51 CEST] <graphitemaster> wm4, space for the second argument to avcodec_decode_video2, the AVFrame
[22:41:22 CEST] <JEEB> anyways, the only reason I commented is because mpv et al utilize the push/pull APIs and refcounted AVFrames
[22:41:31 CEST] <wm4> also did you set threads
[22:41:36 CEST] <graphitemaster> set threads?
[22:41:47 CEST] <JEEB> isn't that enabled by default for many decoders that support threading?
[22:41:51 CEST] <wm4> AVCodecContext.thread_count or something
[22:42:02 CEST] <wm4> JEEB: not in the API, AFAIK
[22:42:04 CEST] <JEEB> oh
[22:42:17 CEST] <graphitemaster> the meat of my video packet handling code: https://pastebin.com/raw/ih4dqeYQ
[22:42:17 CEST] <JEEB> then I'm even more impressed my code was as fast as it was
[22:42:50 CEST] <wm4> you call swscale...
[22:42:56 CEST] <wm4> speed -> ruined
[22:42:59 CEST] <graphitemaster> only if video_resampling
[22:43:03 CEST] <graphitemaster> which I have disabled by default
[22:43:20 CEST] <graphitemaster> that is only used when the video is not already in ycbcr format
[22:43:25 CEST] <wm4> then maybe your funny cMediaVideoPacket stuff
[22:43:53 CEST] <graphitemaster> going to try and set the thread_count thing
[22:43:56 CEST] <graphitemaster> if I can find that field lol
[22:43:59 CEST] <wm4> do you use AVPacket correctly?
[22:45:44 CEST] <graphitemaster> I hope I do
[22:45:56 CEST] <graphitemaster> the actual decode thread stuff https://pastebin.com/raw/kpM9Bbpb
[22:47:53 CEST] <wm4> why the fuck is there a sleep call
[22:48:17 CEST] <graphitemaster> I've commented those out, does not affect performance in any way shape or form
[22:48:29 CEST] <graphitemaster> prolly because those functions don't even do anything if the sleep is < 100ms so
[22:48:35 CEST] <graphitemaster> don't know who put those in there
[22:49:29 CEST] <wm4> let me guess, it's game code
[22:56:52 CEST] <graphitemaster> wm4, nope actually
[22:56:56 CEST] <graphitemaster> engine code though :P
[22:57:24 CEST] <BBB> and the profile?
[22:57:37 CEST] <BBB> like, youre saying its slow and all time is spent in avcodec_decode_video2
[22:57:43 CEST] <BBB> wheres the profile?
[22:58:22 CEST] <graphitemaster> we use sampler profiler called Remotery, sadly I can't get you much out of that other than a screenshot of it's output in a webbrowser
[22:58:35 CEST] <BBB> go for it
[23:00:35 CEST] <graphitemaster> http://i.imgur.com/cTWGSAJ.png
[23:01:11 CEST] <BBB> I mean a profile, like % runtime spent in each function
[23:01:14 CEST] <graphitemaster> that decode, if you look at the code is a scoped profile region around the call
[23:01:51 CEST] <BBB> like http://blog.fellstat.com/wp-content/uploads/2013/04/InstrumentsScreenSnapz001.png
[23:02:21 CEST] <graphitemaster> those values are in microseconds per frame
[23:02:32 CEST] <graphitemaster> so it's taking 17.394 milliseconds in that call
[23:02:47 CEST] <BBB> you have 5 ms in a Create function
[23:02:56 CEST] <BBB> are you calling mallocs and creating objects in there?
[23:03:14 CEST] <graphitemaster> yes :|
[23:03:19 CEST] <BBB> anyway, a profile needs more granularity than this
[23:03:19 CEST] <graphitemaster> cutting off that 5ms would help a lot
[23:03:33 CEST] <BBB> I cant say whether the time spent in Handle() is spent in avcodec_decode_video2() or not
[23:03:40 CEST] <BBB> a good profiler tells you that
[23:03:45 CEST] <graphitemaster> the "[DECODE]" is all avcodec
[23:03:51 CEST] <graphitemaster> ProfileCPU( "cMediaVideoPacket::Handle [Decode]" )
[23:03:51 CEST] <graphitemaster> {
[23:03:51 CEST] <graphitemaster> 	nLength = avcodec_decode_video2( pVideoCodecContext,
[23:03:51 CEST] <graphitemaster> 									 (AVFrame *)pMedia->m_pScratchVideoFrame,
[23:03:51 CEST] <graphitemaster> 									 &nFrameFinished,
[23:03:53 CEST] <graphitemaster> 									 pPacket );
[23:03:54 CEST] <graphitemaster> }
[23:04:05 CEST] <graphitemaster> that ProfileCPU region counts that call and that call only
[23:04:16 CEST] <graphitemaster> it's a C++ object that goes out of scope at the }
[23:04:28 CEST] <graphitemaster> so the ctor/dtor times that region of code
[23:05:05 CEST] <BBB> ok
[23:05:06 CEST] <BBB> so
[23:05:08 CEST] <graphitemaster> 17349 microseconds are spent on avcodec_decode_video2, 5350 miroseconds in that create, for a total of 22704 microseconds (or 22 milliseconds for that frame in total)
[23:05:25 CEST] <BBB> I think you need to learn about profiling a little bit more, sorry
[23:05:40 CEST] <BBB> I dont know how this ProfileCPU thing works
[23:05:54 CEST] <BBB> but a good profiler tells me total runtime, time spent in _all_ (not some, annotated) functions
[23:05:56 CEST] <BBB> etc.
[23:06:02 CEST] <BBB> without that, I cant tell you why its slow
[23:06:08 CEST] <graphitemaster> yeah, this doesn't go that deep :|
[23:06:13 CEST] <graphitemaster> it can only time stuff I annotate
[23:06:14 CEST] <BBB> it could be the locking, the Create, the Write
[23:06:22 CEST] <BBB> on a mac, use isntruments
[23:06:25 CEST] <BBB> instruments*
[23:06:30 CEST] <graphitemaster> I would have to actually dig in deeper with gprof
[23:06:41 CEST] <graphitemaster> or valgrind's cachegrind *shudder*
[23:06:42 CEST] <BBB> on linux, use perf
[23:06:52 CEST] <BBB> perf should be ok, try perf
[23:06:59 CEST] <BBB> Ive used it, its good enough
[23:07:38 CEST] <graphitemaster> I'd have to recompile ffmpeg with debug/perf instrumentation first, and setup a test senario that launches directly into video decoding first just to reduce the noise and nonsense to get there
[23:07:42 CEST] <graphitemaster> but yeah I should.
[23:08:15 CEST] <BBB> how do you think we get ffmpeg so far? its not by annotating functions one-by-one ;)
[23:08:25 CEST] <BBB> its by using great tools that tells us exactly what to optimize etc.
[23:08:42 CEST] <BBB> (instead of us asking one-by-one should I optimize this? should I optimize that?)
[23:09:41 CEST] <graphitemaster> sadly, in my industry, the common ways of timing stuff (since everything is in the context of a frame) is to manually put these annotated blocks of shit around shit and depending on how much annotation you do, the more coarse it can get
[23:10:16 CEST] <BBB> Make SoftwareEngineering Great Again!
[23:10:52 CEST] <graphitemaster> it's not without it's own set of problems, but it works great for getting an overall picture of stuff, plus most instrumentation based profilers (like perf) ruin the performance of things that it takes hours to even get to the part we want to test that is slow (things like cachegrind for instance, run our stuff 400 to 2000x slower than native)
[23:11:16 CEST] <graphitemaster> you could probably simulate a cache or cpu in your head faster heh
[23:13:09 CEST] <wm4> perf doesn't need instrumentation
[23:16:37 CEST] <graphitemaster> thanks arch
[23:16:38 CEST] <graphitemaster> CONFIG_AUDIT is disabled in the Arch kernel packages so a custom kernel
[23:16:38 CEST] <graphitemaster> is required for most components of this package. However, some features
[23:16:39 CEST] <graphitemaster> like the utility methods in libaudit work without kernel support.
[23:16:49 CEST] <graphitemaster> perf is gimped as hell with Arch kernels apparently
[23:16:57 CEST] <graphitemaster> don't feel like building a kernel right now
[23:19:03 CEST] <BBB> is 22 ms the first frame?
[23:19:11 CEST] <BBB> or average over all frames?
[23:19:28 CEST] <wm4> anyway, as long as you set the thread count, and your AVPacket data is aligned and padded with 0s, all should be fine
[23:19:40 CEST] <BBB> set thread count to 0
[23:19:43 CEST] <BBB> so it auto-detects threading
[23:19:57 CEST] <BBB> is that not the default yet?
[23:20:04 CEST] <wm4> no idea
[23:20:13 CEST] <wm4> I think it's not
[23:20:30 CEST] <wm4> and do it _before_ opening the codec
[23:21:00 CEST] <graphitemaster> maybe I should just use mpv's libmpv. it seems to have an opengl api that gives me a GLuint handle and supports hwacceleration too
[23:21:05 CEST] <graphitemaster> and scrap all this garbage
[23:21:25 CEST] <graphitemaster> wm4, have an example of how to use libmpv ?
[23:21:44 CEST] <wm4> graphitemaster: there's a mpv-examples repo
[23:26:12 CEST] <graphitemaster> wm4, OpenGL ES support too GLES3 similar to desktop GL 3.x core profile (supports PBOs and what not, very little changes of code needed)
[23:26:23 CEST] <graphitemaster> if not I'll add it myself if need be
[23:26:39 CEST] <graphitemaster> if this turns out to be way better for performance that is ;-)
[23:27:16 CEST] <wm4> not sure what you're asking... it supports OpenGL desktop 2.1 and ES 2.0
[23:28:22 CEST] <graphitemaster> yeah but will it work with a higher context, i.e some functionality is deprecated and if you use it with my non compatible context, it'll fail
[23:28:44 CEST] <graphitemaster> since we don't create compatability context for performance reasons, only core profile contexts
[23:31:11 CEST] <wm4> mpv CLI normally creates a GL 4.3 core context
[23:31:21 CEST] <wm4> there shouldn't be any compatibility crap in use
[23:31:39 CEST] <wm4> btw. we're also working on Vulkan support, and our worst problem is finding a GLSL compiler
[23:35:27 CEST] <graphitemaster> glsl-lang is fine
[23:35:40 CEST] <graphitemaster> most people just be integrating that into the thing
[23:35:49 CEST] <graphitemaster> or offline compiling the shaders and plopping their bytecode into stuff
[23:37:13 CEST] <wm4> you mean the khronos lib? it has global mutable state, a C++ only API (and a broken and unmaintained C one), and apparently leaks memory
[23:37:24 CEST] <wm4> we need a lib
[23:37:31 CEST] <wm4> because we compile shaders on the fly
[23:58:11 CEST] <J_Darnley> graphitemaster: perf should work okay on Arch for your user space code, it does for me using their kernel.
[23:58:36 CEST] <graphitemaster> wm4, that is an unsolved problem right now
[23:58:49 CEST] <graphitemaster> wm4, apparently LunarG has such a compiler but won't make it open source
[23:59:09 CEST] <J_Darnley> I know nothing of the nernel but I assume CONFIG_AUDIT lets you profile kernel code.
[23:59:40 CEST] <graphitemaster> J_Darnley, I tried it, i get some perf data but when I go to annotate it it says no samples :|
[23:59:45 CEST] <graphitemaster> so I assume that I do need that?
[00:00:00 CEST] --- Sun Aug 27 2017