[Ffmpeg-devel-irc] ffmpeg-devel.log.20140527

Wed May 28 02:05:02 CEST 2014

[01:04] <BBB> I notice my english in that commit msg wasn't quite wonderful either
[01:04] <BBB> ohwell
[01:14] <BBB> so that other loop (the final else statement in that function) is only ever called once, right?
[01:14] <BBB> so why do we optimize that?
[01:17] <BBB> michaelni: in what situation is that loop that includes LINEAR_CORE performance-sensitive? do we have a command line that tests that?
[01:19] <BBB> (and yes I'm trying to get to a situation where the function itself can be moved outside optimizable templates, and the speed-sensitive things within are just regular functions pointers; the same cleanup did wonderful things to sws and it's no different here)
[01:21] <michaelni> "make fate-list | grep fate-swr-resample_lin" should list the tests that test the linear interpolation code
[01:24] <BBB> why isn't there just a special case for linear (calling LINEAR_CORE) in the same way that there is a special loop for COMMON_CORE above?
[01:24] <BBB> and then the final call for all of them can be unoptimized c code
[01:24] <BBB> (the else)
[01:29] <michaelni> i dont think theres a real reason, just the thinking that the non default settings are less important to optimize for as few people would use them
[01:31] <michaelni> it shure could get its own if() case 
[01:31] <BBB> if you make linear it's own if case, the else just happens once for any resampling session
[01:31] <BBB> so you can make it plain c
[01:32] <BBB> then you can get rid of the tempting and the two relevant if statements can get their own yasm function
[01:32] <BBB> saves lots of binary code, the yams will likely be faster because, well, inline sucks, we can get rid of avx inline support, and it will support msvc
[01:32] <BBB> I see only positives
[01:39] <michaelni> compensation_distance should be non zero for -async > 1 and "jittery" input timestamps 
[02:27] <BBB> michaelni: is there a test for that?
[02:29] <michaelni> possibly not, unless fate-filter-aresample covers it
[02:29] <BBB> can you handle this outside the call to swri_resample?
[02:30] <BBB> i.e. just do two calls to swri_resample on a dst_incr switchpoint
[02:30] <BBB> in any other case I don't think (from the inside of this function) there's anything special going on
[02:34] <michaelni> might be possible, yes
[02:36] <cone-843> ffmpeg.git 03Carl Eugen Hoyos 07master:4d8c28deab24: imgutils: make systematic palette opaque.
[02:36] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:e222cfefcdcb: Merge commit '4d8c28deab2488579f585406110b1be790896e59'
[02:38] <BBB> ok better patch sent
[02:39] <BBB> I'll see what I can do about the index < 0 and compensation_index != 0 pieces, I won't have much time on weekdays but it would make this code significantly faster
[02:39] <BBB> and yasmifiable (which likely will make it faster also)
[02:45] <BBB> michaelni: since I can't seem to find good tests for compensation_distance, I'll leave that alone and hope you can do that (or write a test)
[02:45] <BBB> but index < 0 is triggered in all tests so I can do that
[02:52] <cone-843> ffmpeg.git 03Anton Khirnov 07master:45fc73edfe07: vf_format: rework format list parsing
[02:52] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:6e9dbee7c657: Merge commit '45fc73edfe071f9690e8671ed2dc402b1cb02ece'
[02:53] <michaelni> ill try to find somethig to test compensation_distance unless i forget 
[03:01] <cone-843> ffmpeg.git 03Anton Khirnov 07master:862f33c10ea3: vf_scale: use the pixfmt descriptor API
[03:01] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:a37f2cc5797a: avfilter/vf_format: fix duplicate ;
[03:01] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:de5ec0882591: Merge commit '862f33c10ea38ea49fa4188725df5e5246dbd1d8'
[03:09] <cone-843> ffmpeg.git 03Anton Khirnov 07master:a7d070acb55c: vf_fieldorder: avoid using AV_PIX_FMT_NB
[03:09] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:cdf6a9441ded: Merge commit 'a7d070acb55c3ebbdd5e93e3366f32865732b8a3'
[03:22] <cone-843> ffmpeg.git 03Anton Khirnov 07master:b03b2d86aa9d: buffersrc: avoid using AV_PIX_FMT_NB
[03:22] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:50ffd8439c93: Merge commit 'b03b2d86aa9d79670825b42d8a8a7c41f59cb444'
[03:28] <cone-843> ffmpeg.git 03Anton Khirnov 07master:7cc4c9f32f44: lavfi/formats: avoid using AV_{PIX,SAMPLE}_FMT_NB
[03:28] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:a1cb4efd2ff0: Merge commit '7cc4c9f32f446feaec5447e3d097e8147e35f156'
[05:00] <BBB> michaelni: I'm also not quite sure why that only works on planar audio, packed audio should work just as well, you only write one sample per inner loop iteration so it's about as simple as adding a dst_step/stride or so?
[05:00] <BBB> anyway, bed now
[06:13] <cone-843> ffmpeg.git 03Michael Niedermayer 07master:a0c5cd3475fd: avcodec/x86/dsputilenc: set the count of SSE registers correctly for get_pixels
[06:13] <cone-843> ffmpeg.git 03James Almer 07master:e64e079ece7d: x86/dsputilenc: implement SSE2 version of diff_pixels
[08:07] <kurosu> BBB: thanks for not being lazzy and starting the rewrite of those resampling loops
[08:08] <kurosu> probably jamrial is the one appreciating it the most :)
[08:15] <jamrial> once that code is ported to yasm, i'll post the xop and fma versions i had planned back when i tried porting it myself :P
[08:17] <jamrial> didn't do it for the current inline version as they would mean a lot of bloat (both code and final binary size)
[08:27] <kurosu> is that the k10 computer you mentioned that allows you benchmarking it ?
[08:27] <kurosu> s/allows/enables but you should get my meaning
[08:28] <jamrial> no, xop/fma4 was introduced with Bulldozer
[08:28] <jamrial> k10 is sse3 (not even ssse3)
[08:29] <kurosu> oh, right
[08:29] <jamrial> fma3 then with Piledriver, and Haswell on Intel's side
[08:29] <kurosu> the bulldozer probably doesn't even have 3dnow as amd deprecated years ago
[08:30] <jamrial> yeah, k10 was the last one with 3dnow and 3dnowext
[08:30] <kurosu> have you ran all of fate with 3dnow at most ?
[08:31] <kurosu> probably you did, and hearing no complain from you is a good omen :)
[08:34] <jamrial> did it for the 3dnow port of vector_clipf, yes
[08:34] <jamrial> still have to send that to the list
[13:09] <BBB> yeah if jamrial wants to rewrite the main loop in yams after this, that'd be awesome
[13:41] <BBB> has anyone ever noticed how identical avresample and swresample are in terms of implementation
[13:42] <BBB> I mean, I'm just throwing an elephant in the room here because what do I know, but why on earth would you make a new lib if the implementation (not just use case) is 100% identical
[13:43] <nevcairiel> because you got paid to create it
[13:55] <compn> and nobody said you couldnt copy the other lgpl code ?
[13:55] <compn> :P
[13:56] <compn> jk maybe theres only one way to do it, because of sws limitations
[13:57] <compn> BBB : so you're saying its not 'better' but the exact same ?
[13:59] <BBB> which one is better than which?
[14:00] <compn> yes
[14:00] <BBB> as for main resampling loop, I'd expect swr to be significantly faster, ave does jamrial's original approach
[14:00] <BBB> ave=avr 
[14:00] <BBB> so there's a call overhead + branches in the main resampler
[14:01] <BBB> that's incredibly slow
[14:01] <BBB> (both in loop)
[14:01] <BBB> swr currently has just branch (b/c of inline asm), and I've modified it so that there's no more branch so you can do outside loop in yams with one single call per frame (instead of one call per sample in a loop)
[14:02] <Daemon404> i kind of doubt audio resampling is anyones bottleneck tbh
[14:02] <BBB> that's true also
[14:02] <BBB> but hey we do this because I CAN HAZ FAZTERDER!
[14:02] <BBB> !!11
[14:02] <BBB> 25% is a reasonable gain
[14:03] <Daemon404> and one of the libraries exists because youtube paid for it
[14:03] <Daemon404> re: above
[14:05] <compn> is that a bad thing or a good thing ?
[14:06] <compn> or just a fact 
[14:06] <compn> anyways
[14:07] Action: compn goes afk
[15:03] <j-b> more lib*sample
[15:04] <ubitux> BBB: https://lists.libav.org/pipermail/libav-devel/2012-April/026066.html
[15:09] <ubitux> Daemon404: and ffmtech paid for another?
[15:20] <cone-554> ffmpeg.git 03James Almer 07master:58632070866a: x86/dsputilenc: use HADDD in ff_sse16_sse2
[15:38] <plepere> ubitux, can I ask for your help ? I'm blocked on some really small function
[15:39] <ubitux> maybe
[15:40] <plepere> thanks
[15:40] <plepere> http://pastebin.com/iLDR8NMd
[15:41] <plepere> the code is quite simple. It's basically calculating coeff, and doing dst+=coeff clipped to the bit depth
[15:42] <ubitux> ok
[15:42] <plepere> but I've got incorrect MD5s when using assembly
[15:45] <plepere> and using gdb, the results seem right
[15:47] <ubitux> unrelated but you can probably unroll the 8 lines
[15:47] <plepere> ok
[15:47] <ubitux> and probably do 2 lines at a time
[15:48] <plepere> well normally it's all in a macro to have the 4x4, 8x8, with 8 and 10bit support
[15:48] <plepere> even 16x16 and 32x32
[15:49] <ubitux> %rep & friends
[15:49] <plepere> oh yeah
[15:49] <ubitux> anyway well i don't see anything obviously wrong
[15:50] <plepere> I don't know if that's a good thing. :p
[15:51] <ubitux> btw, you can probably remove the unpack somehow
[15:51] <ubitux> with some half signed byte addition like there is in loop filters
[15:52] <ubitux> plepere: i'd suggest to memcmp in the caller
[15:52] <ubitux> duplicate the input block, call your other function, memcmp the 2
[15:53] <ubitux> a bit overkill but well&
[16:07] <kriegerod> hi all, is there any work in progress, or just useful thoughts or whatever regarding better HTTP MJPEG streams recognition? I see some streams don't open well with ffmpeg without explicit "-f mjpeg -i ..."
[16:08] <plepere> hmm
[16:10] <Daemon404> i dont any image format gets probed aside fom ext
[16:10] <Daemon404> its quite annoying
[16:10] <plepere> ubitux, so you're suggesting that I'd work not on dst, but on dst2, basically ?
[16:11] <plepere> do dst2= memcpy(dst); asm_function; dst=memcpy(dst2)
[16:12] <ubitux> plepere: in the caller, instead of func_c() you do memcpy(tmp,input); func_c(input); func_asm(tmp); memcmp(input,tmp)
[16:12] <ubitux> you add a counter or something to identify the mismatching one
[16:13] <ubitux> then when you have it, you can start debugging
[16:13] <plepere> ok I'll try that. thanks
[16:20] <michaelni> kriegerod, open a ticket please if theres none yet
[16:22] <michaelni> ok to apply: "[FFmpeg-devel] [PATCH 2/2] x86/vp9: inital AVX2 intra_pred" ?
[16:40] <cone-554> ffmpeg.git 03Andreas Cadhalpun 07master:0f17bc644c4a: Improve the detection of architecture x86.
[17:29] <plepere> ubitux, I've found something : the result is wrong when [coeffs] is below 0
[17:29] <ubitux> because of a bug in your signed add then :)
[17:30] <ubitux> look at the 0x80 hacks all over the vp9 lpf code
[17:30] <ubitux> or in vp8
[17:30] <ubitux> to do some weird signed/unsigned additions
[17:31] <plepere> I'll look into it, but I'm quite uncomfortable about this
[17:32] <plepere> I'll manage
[17:35] <plepere> I'm starting by replacing the shr by sar for the shifts
[17:42] <plepere> ok, your 0x80 hack worked wonders. :)
[17:42] <plepere> well, I did it in a more.. barbaric way :     xor            tempq, 0xffffffffffff0000
[17:44] <ubitux> i didn't invent anything, i lamely stole the trick from vp8
[17:44] <ubitux> so it works?
[17:46] <plepere> no
[17:46] <plepere> D:
[17:46] <ubitux> :(
[17:46] <plepere> I'll look more into the printfs
[17:49] <plepere> well I'm an idiot. I broke the case when the coeff is positive.
[17:55] <plepere> SUCESS !!!!
[17:56] <plepere>     cmp            tempd, 0x8000
[17:56] <plepere>     jle             .positive
[17:56] <plepere>     xor            tempq, 0xffffffffffff0000
[17:56] <plepere> .positive
[17:56] <plepere> thanks ubitux. :)
[17:56] <ubitux> i'm sure you can do better&
[17:57] <ubitux> like, branchless.
[17:57] <plepere> yes, I see what you mean
[17:58] <plepere> like shift left first then shift right arithmetic ?
[17:58] <ubitux> dunno exactly what you're doing and what your code looks like :p
[17:59] <plepere> it's the code from before
[17:59] <plepere> argh, I have to go
[17:59] <plepere> I'll do something nice. 
[17:59] <plepere> without jumps. :)
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:64fb19cc995f: avfilter/formats: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:f3fdc32e2f67: avfilter/crop: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:5feac96fdb37: avfilter/hflip: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:c0f8801e4793: avfilter/il: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:1dbc98461b62: avfilter/vf_mergeplanes: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:cfa0ad6eec96: avfilter/vf_noise: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:e10ac3a12ea8: avfilter/vf_swapuv: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:515e8aed0318: avfilter/vf_transpose: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:ee0c91cc6582: avfilter/vf_telecine: Avoid using non public AV_PIX_FMT_NB
[18:00] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:0d26264fb4c8: avfilter/vf_drawtext: Avoid using non public AV_PIX_FMT_NB
[18:01] Action: Daemon404 wonders if michaelni has ever taken a vacation
[18:04] <iive> there was once a power outage, he had to be hibernated to a harddisk. ;)
[18:34] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:1853e2cba9b2: avfilter/formats: Avoid using non public AV_SAMPLE_FMT_NB
[18:37] <jamrial> Skyler_: can you check the "[FFmpeg-devel] [PATCH 2/2] x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}" thread?
[18:37] <jamrial> just noticed that x264 is using HADD* macros in a couple of mmx functions
[18:38] <jamrial> and as is it's expanding to pshufw, which is an mmxext/sse instruction
[18:46] <Skyler_> x264 doesn't support MMX-only CPUs anyways, so that's likely a function naming error
[18:47] <Skyler_> (MMXEXT is the minimum for asm, as x264 uses -inline- MMXEXT)
[18:48] <Daemon404> there is a relevant pengvado quote for this of course
[18:48] <jamrial> alright
[18:48] <Daemon404> (from back when he actually talked)
[18:59] <Skyler_> so I guess it should be fixed in that function names should probably be fixed.
[19:31] <kierank> Daemon404: he did once iirc
[20:09] <wm4> hm apparently libavformat/utils.c has weird logic to add a dts if there is no dts?
[20:10] <wm4> because I'm pretty sure mkv doesn't store both pts and dts
[20:10] <nevcairiel> of course it does, it invents all sorts of timestamps
[20:27] <cone-554> ffmpeg.git 03Lou Logan 07master:8a64ea768be0: doc/filters: amix only supports float samples
[20:33] <michaelni> anyone who wants to disable timestamp calculation and just want to get whats stored, see: AVFMT_FLAG_NOFILLIN
[20:34] <wm4> michaelni: ah, nice
[20:47] <michaelni> should i "merge" this: https://github.com/FFmpeg/FFmpeg/pull/72 ? (its not a technical issue, rather about README name & syntax)
[20:49] <Daemon404> personally, no
[20:49] <Daemon404> opinions may vary
[20:51] <llogan> might be kind of weird to have one be markdown style and the others not
[20:51] <llogan> INSTALL, etc
[20:52] <Daemon404> pretty sure the author is one of those people who thinks github == git
[20:56] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:0674da997abe: avcodec/cavs: use av_mallocz_array()
[20:56] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:2870617853b5: avcodec/vp3: use av_mallocz_array()
[20:56] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:5be8c276946d: avformat/rtspdec: Use av_mallocz_array()
[20:57] <michaelni> Daemon404 & llogan please reply to him
[20:58] <llogan> i've never really used github...i'll have to remember my account(s).
[21:16] <wm4> <Daemon404> pretty sure the author is one of those people who thinks github == git <- lol
[21:17] <wm4> but it's good to see that people making these "pull requests" are not told to fuck off
[21:17] <wm4> though it's unfortunate, because ffmpeg's review process is mail based
[21:20] <wm4> so this is basically https://github.com/KonradIT/FFmpeg/blob/master/README.md vs. https://github.com/FFmpeg/FFmpeg/blob/master/README
[22:14] <UtUser> Guys, I think the Ut Video decoder is broken again.
[22:14] <Daemon404> the actual decoder
[22:14] <Daemon404> of the lib
[22:14] <Daemon404> or*
[22:15] <UtUser> or?
[22:15] <Daemon404> utvideodec.c or libutvideodec.cpp
[22:15] <Daemon404> the external or internal one
[22:15] <UtUser> I'm using a static build
[22:16] <Daemon404> does it list the decoder has utvideo or libutvideo
[22:16] <UtUser> just a sec...
[22:16] <Daemon404> and in what way is it broken
[22:16] <UtUser> giving the same messages as in this bug report from a while ago: http://trac.ffmpeg.org/ticket/2661
[22:16] <UtUser> and it encoded perfectly
[22:17] <UtUser> to Ut Video I mean
[22:19] <Daemon404> it hasnt been touched since then
[22:19] <Daemon404> so a sample is likely needed
[22:19] <UtUser> Well, it identifies the stream as just "utvideo" but I'm not sure if I'm looking in the right place
[22:19] <Daemon404> how was the file generated
[22:19] <UtUser> 1 sec
[22:20] <UtUser> ffmpeg -i "%1" -c:v utvideo -pix_fmt rgb24 -c:a wavpack "%1-lossless-rgb24.mkv"
[22:21] <UtUser> never got any messages like this with a build from about 7 months ago
[22:25] <Daemon404> we do need some sort of sample to test on
[22:25] <Daemon404> input to the encoder + command line to reproduce
[22:25] <Daemon404> file it on trac
[22:25] <UtUser> the trouble is, I'm not sure if a really short sample would trigger these sorts of messages
[22:26] <UtUser> they are sporadic during transcoding
[22:26] <Daemon404> a single frame in theory could
[22:26] <Daemon404> utvideo is keyframe-only
[22:26] <UtUser> I know
[22:26] <UtUser> but it seems like not all of the frames are bugged
[22:26] <Daemon404> figure out which frame() are bugged -> cut it
[22:26] <Daemon404> ?
[22:26] <Daemon404> er frame(s)
[22:26] <UtUser> oh ok
[22:26] <UtUser> smart
[22:33] <cone-554> ffmpeg.git 03Diego Biurrun 07master:f1df0a4c08b5: on2avc: Remove pointless dsputil.h #include
[22:33] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:d84286b1ea1a: Merge commit 'f1df0a4c08b54e722e7a2c797d0d31c7f2c531d0'
[22:40] <cone-554> ffmpeg.git 03Diego Biurrun 07master:888dcd86755d: h264_picture: Remove pointless dsputil.h #include
[22:40] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:43c57dbe1454: Merge commit '888dcd86755d37e55fd74166f6d38ad66d41db58'
[23:14] <UtUser> BOOM! found some bugged frames!
[23:14] <UtUser> 2200-2240
[23:14] <UtUser> I feel like the matrix or something watching a console for half an hour
[23:15] <UtUser> is there an alternative to -ss that I can use to just start at the frame number?
[23:17] <cbsrobot> why not use -ss ?
[23:17] <UtUser> would it be like -ss 0:0:0:2200?
[23:17] <cbsrobot> 2200/fps
[23:18] <UtUser> gotcha
[23:19] <cbsrobot> or for the lazy ones: -ss `$(echo "2200/fps" | bc)`
[23:19] <cbsrobot> just replace fps
[23:19] <llogan> or use select filter maybe
[23:19] <UtUser> awesome functionality
[23:19] <UtUser> I think I'll just stick to elementary arithmatic for now though :)
[23:24] <UtUser> now te damn program has to seek to that point after having taken almost a second to decode each frame
[23:24] <UtUser> ...and it's done
[23:27] <UtUser> MPV's decoder, which is based on FFMPEG, reports errors even when playing said 118 MB clip
[23:27] <wm4> wut
[23:27] <UtUser> I'll try and trim it down some more
[23:36] <cone-554> ffmpeg.git 03Diego Biurrun 07master:0d439fbede03: dsputil: Split off HuffYUV decoding bits into their own context
[23:36] <cone-554> ffmpeg.git 03Michael Niedermayer 07master:e2abc0d5cacc: Merge commit '0d439fbede03854eac8a978cccf21a3425a3c82d'
[00:00] --- Wed May 28 2014