[Ffmpeg-devel-irc] ffmpeg-devel.log.20140621

Sun Jun 22 02:05:02 CEST 2014

[00:33] <J_Darnley> What was that joke RFC?  IP over Carrier Pigeon?
[00:33] <nevcairiel> IP over Avian Carriers
[00:34] <nevcairiel> RFC 1149
[00:34] <J_Darnley> :)
[01:29] <BBB> J_Darnley: well Im taking advantage of the convention that all 4byte operations zero the upper four bytes (i.e. auto zero extend)
[01:31] <Daemon404> Timothy_Gu, fyi: in 109 of libx265.c
[01:31] <Daemon404> ctx->params->frameNumThreads = avctx->thread_count;
[01:31] <Timothy_Gu> yeah?
[01:31] <Daemon404> something to point to rather than benchmarks
[01:31] <Daemon404> is all
[01:35] <Timothy_Gu> you can add a link to that in the ticket. im too lazy to do it myself :)
[01:36] <J_Darnley> BBB: I didn't see anything else when I was looking and it compiled without complaint.
[01:36] <J_Darnley> ... but I couldn't test it (patch didn't apply cleanly).
[01:37] <J_Darnley> If you have a public repository somewhere, I will clone it tomorrow and try actually running it and do some debugging.
[02:05] <BBB> J_Darnley: https://github.com/rbultje/ffmpeg/tree/swr
[03:42] <jamrial> BBB: fixed the win64 crash
[03:42] <BBB> \o/
[03:42] <jamrial> you were doing "mov ctxq, r0mp" before PUSH r3
[03:42] <jamrial> ctx is r3
[03:43] <BBB> ...
[03:43] <BBB> *headbump*
[03:43] <jamrial> :P
[03:44] <BBB> ok pushed that to github then
[03:44] <jamrial> anyway, try doing some benchs on your end. i got some really weird results on my linux x64 vm
[03:44] <BBB> so speed is slower?
[03:44] <BBB> Ill do a few tomorrow
[03:44] <BBB> I havent done anything on that end, just random instruction sequences that work in terms of md5
[03:44] <jamrial> on my second try (win32 and linxu x64 on ubitux's box) it was slightly slower
[03:44] <BBB> well slower isnt good, it should be faster :-p
[03:45] <BBB> Ill have a look, thanks for testing anyway& youre just using a long mp3 -af & right?
[03:45] <BBB> or is there a long test sequence I can use?
[03:45] <BBB> like testsrc or so
[03:45] <jamrial> a 22 minutes aac 44khz into 48khz
[03:47] <jamrial> a very long Dream Theater song :P
[03:47] <BBB> I see
[03:47] <BBB> ok Il use something similar'ish
[03:47] <BBB> bbl, test tomorrow
[03:47] <jamrial> ok, later
[04:21] <BBB> jamrial: did you test avx? or just win64?
[04:21] <BBB> (or has anyone tested the avx?)
[04:32] <cone-958> ffmpeg.git 03Michael Niedermayer 07master:1caedf629a06: avcodec/ituh263enc: fix advanced intra coding
[04:42] <jamrial> i tested avx and it fate passes
[11:41] <J_Darnley> BBB: I apologise that this sounds really stupid but, how do I test your resample re-write?
[11:42] <J_Darnley> More than just using -ar and ffmpeg?
[11:55] <cone-471> ffmpeg.git 03Diego Biurrun 07master:97578f5f3b27: build: Add missing object file for Matroska demuxer
[13:00] <BBB> J_Darnley: I think you take a long music sample and you play it, and then resample (using e.g. ffmpeg -i audiofile -af aformat=fltp,aresample=some_value:internal_sample_fmt=fltp -f null -)
[13:02] <BBB> then run that in a debugger or so to ensure it really hits that function, and then put START/STOP_TIMER around the function call in resample.c (*consumed = c->dsp.resample_common[fn_idx](c, dst, src, dst_size, update_ctx);, line 301) or time the whole ffmpeg run, before and after patch
[13:02] <BBB> (the timing is obviously not in a debugger)
[13:03] <J_Darnley> Ah planar, that might help
[13:10] <J_Darnley> Well... I doesn't crash
[13:11] <J_Darnley> gdb reaches the breakpoint I set in ff_resample_common_float_sse
[13:18] <BBB> ok then youre fine
[13:18] <BBB> so go out of debugger and time the function calls or the executable runtime :)
[13:19] <J_Darnley> Yeah, its running
[13:20] <J_Darnley> with your patch: 11479 decicycles in resample, 698905 runs, 349671 skips
[13:20] <J_Darnley> (that's a lot of skips)
[13:21] <ubitux> (there is no threading involved, right?)
[13:21] <J_Darnley> Almost certainly, ffmpeg uses threads all over the place
[13:22] Action: J_Darnley reconfigures
[13:35] <J_Darnley> ugh, that's not any better
[13:35] <J_Darnley> without patch: runtime 14.98, 10594 decicycles in resample, 698947 runs, 349629 skips
[13:38] <J_Darnley> with patch: runtime 15.89, 11006 decicycles in resample, 698933 runs, 349643 skips
[13:42] <J_Darnley> I'll be right in a short while.
[13:42] <J_Darnley> um
[13:42] <J_Darnley> I'll be back in a short while.
[14:29] <BBB> ok new patch on github
[14:29] <BBB> for me its a lot faster
[14:29] <BBB> (I added some loop aligns)
[14:53] <BBB> maybe llvm just sucks
[14:53] <cone-471> ffmpeg.git 03Anshul Maheshwari 07master:36393434782b: ffmpeg: fix memleak and corruption of AVSubtitle with multiple outputs
[15:04] <J_Darnley> FYI: you have left timer macros in it.
[15:04] <BBB> in a top patch, to make it easier to reproduce my testing
[15:05] <BBB> you can remove the patch if you dont like it :)
[15:05] <BBB> I wonder if llvm is really that bad
[15:05] <J_Darnley> No I don't mind, I just saw them when git said I had come conficts to merge
[15:07] <J_Darnley> dammit git!
[15:09] <J_Darnley> uh
[15:09] <J_Darnley> is 520 decicycles right?
[15:11] <J_Darnley> Maybe that's just your TIMER2 macro
[15:13] <J_Darnley> With the usual STOP_TIMER() macro I get the same speed
[15:19] <J_Darnley> without patch: runtime 13.29, 10466 decicycles
[15:19] <J_Darnley> with patch: runtime 14.07, 11011 decicycles
[15:38] <BBB> J_Darnley: you had massive numbers of skips, right?
[15:38] <BBB> thats b/c the number of samples per run varies
[15:38] <BBB> so I changed it to measure per-sample speed (thats my custom macro)
[15:39] <BBB> then number of skips is minimal
[15:39] <BBB> and yes per-sample cycle count is expected to be much smaller
[15:41] <J_Darnley> that makes sense
[15:42] <J_Darnley> I will re-run shortly
[15:50] <BBB> yeah total runtime is also faster for me
[15:50] <BBB> I wonder if its compiler or something else...
[15:50] <BBB> can I see disassembly pooped out for the inline asm functions for you?
[15:50] <BBB> and which compiler?
[15:55] <ubitux> you can use asetnsamples to control the number of samples going into aresample
[15:56] <ubitux> but i guess the internals split them again
[15:56] <ubitux> swr internals*
[16:07] <BBB> I get near-zero skips with this adaptation, versus about 50% skips before
[16:07] <BBB> so I think this is ok
[16:08] <BBB> 12.9sec -> 12.3sec total runtime (linear)
[16:08] <BBB> or 10.1 -> 9.6 (non-linear)
[16:08] <BBB> on 32bit, but I guess its the same for 64bit
[16:09] <BBB> commandline: ./ffmpeg -i /tmp/x.wav  -af aformat=fltp,aresample=48000:internal_sample_fmt=fltp[:linear_interp=1] -f null -nostats -v error -
[16:09] <BBB> where the stuff [] is only for linear
[16:09] <BBB> x.wav is a one-hour mp3 file transcoded to wav to decrease runtime spent in mp3 decoding
[16:13] <J_Darnley> BBB I'll post those a little later  I'm about to go out.
[16:13] <BBB> ty
[16:13] <J_Darnley> But I am using cygwin's gcc 4.8.2 or .3
[17:05] <cone-471> ffmpeg.git 03Michael Niedermayer 07master:a2de7b1bd504: avcodec/bitstream: document the double volatile
[17:05] <cone-471> ffmpeg.git 03Michael Niedermayer 07master:5ab51f753583: avcodec/libtwolame: fix encoding lsf with defaults
[18:55] <J_Darnley> BBB: I have what you asked for.
[18:56] <J_Darnley> It is gcc 4.8.3
[18:57] <J_Darnley> and here is the output of objdump: http://pastebin.com/xBGpS3zZ
[19:33] <cone-471> ffmpeg.git 03Clément BSsch 07master:ded3c9fd32af: avfilter: add hqx filter (hq2x, hq3x, hq4x)
[19:39] <wm4> ubitux: :D
[19:40] <ubitux> :)
[19:40] <ubitux> i'm almost done writing about it for the curious
[19:47] <Compnn> ubitux : nice job on hqx stuff :)
[19:47] <Compnn> hows it look compared to lanczos ? :P
[19:48] <Compnn> on real video not ... pixel art
[19:48] <Compnn> i think hq2x is also used on video games , emulators etc ?
[19:48] <Compnn> maybe its useful to those x264 guys
[19:49] <ubitux> try it
[19:49] <ubitux> but it will probably be a very ugly scale
[19:50] <ubitux> Compnn: hqx should be compared to xbr or stuff like http://research.microsoft.com/en-us/um/people/kopf/pixelart/
[19:54] <j-b> how does it improve the old DivX blocks?
[19:56] <ubitux> lol
[19:57] <j-b> :D
[20:00] <wm4> j-b: use libpostproc and vf_spp
[20:01] <wm4> vf_fspp that is
[20:01] <wm4> because supposedly that's the reason why these can't be deleted
[20:01] <ubitux> 02:17:20 < michaelni> matrixbench_mpeg2.mpg -vf scale=320:240,format=monow,spp=6:63,hqx=4
[20:01] <ubitux> ;)
[20:01] <j-b> wm4: libpostproc is too old and really not good
[20:02] <wm4> ubitux: format=monow
[20:02] <wm4> wat
[20:02] <j-b> wm4: fspp?
[20:02] <wm4> j-b: some old libmpcodec crap filter
[20:02] <wm4> I think this one uses snow ;D
[20:02] <ubitux> no, uspp is snow
[20:03] <wm4> oh
[20:03] <ubitux> fspp is just a "fast" version of spp
[20:03] <ubitux> (fast but different)
[20:03] <wm4> lol
[20:03] <wm4> anf vf_spp which was ported is not enough?
[20:03] <wm4> I don't get this
[20:03] <j-b> wm4: but seriously, a lot of the 200x DivX/Xvid around look very very bad on our HD screens
[20:03] <ubitux> well it's probably not fast enough ;)
[20:04] <wm4> j-b: no doubt
[20:04] <wm4> ubitux: having dozens of pp filters is not useful...
[20:04] <ubitux> wm4: let's add another one to rule them all
[20:04] <wm4> users will have no idea which one to use in the first place
[20:04] <wm4> so pick a good one and delete the rest as useless
[20:04] <ubitux> probably depends on source and settings
[20:05] <j-b> I agree with my enemy wm4 on this :)
[20:05] <wm4> enemy? :(
[20:05] <j-b> wm4: lol :)
[20:05] <j-b> One good pp filter would be nice
[20:05] <ubitux> spp isn't good?
[20:05] <wm4> ubitux: we don't know
[20:05] <iive> spp uses 2d (i)dct, fspp uses a series of 1D (i)dct's
[20:05] <wm4> I haven't really seen convincing results from them whenever I tried, either
[20:06] <j-b> me neither
[20:06] <j-b> gradfun and yadif are amazing piece of filter code, but for pp, it's disappointing for users
[20:06] <iive> spp is excellent, but it is kind of slow.
[20:06] <wm4> michaelni: you probably know most about these pp filters, your opinion?
[20:07] <iive> gradfun and yadif and not postprocessing filters.
[20:07] <wm4> j-b: I often see gradfun being ridiculed for being crap
[20:07] <j-b> wm4: I see that too. But I disagree. It works wonders for anime and ocean-movies.
[20:08] <j-b> it's crap if people apply it everywhere, like Complex Sharpen 2 :)
[20:09] <j-b> iive: how slow?
[20:09] <iive> j-b: it does about 64 2d fdct and idct for every block.
[20:09] <iive> at the maximum (6) level.
[20:11] <michaelni> spp with the right parameters can make old low bitrate files look much nicer then without
[20:12] <iive> it is also an excellent non-temporal denoiser 
[20:12] <iive> btw, I wonder if maybe h264 8x8 dct would give slightly better results when h264 video is handled. (and speed wise)
[20:15] <cone-471> ffmpeg.git 03Carl Eugen Hoyos 07master:29fc468d0a22: Do not show libzvbi in the configure output if it was not enabled.
[20:15] <cone-471> ffmpeg.git 03Carl Eugen Hoyos 07master:e3fd263f0b73: Show duration for large asf files as written in the file header.
[20:31] <ubitux> https://news.ycombinator.com/item?id=7925671 here you go
[20:38] <cone-471> ffmpeg.git 03Michael Niedermayer 07master:dc5972f88601: avformat/flvdec: give live_flvdec a separate name
[21:06] <Daemon404> [19:07] <@iive> gradfun and yadif and not postprocessing filters. <-- gradfun can be used just fine as a postprocessng filter
[21:06] <Daemon404> during playback
[21:06] <Daemon404> its used to be quite popular to do so
[21:07] <Daemon404> unless youre using a strict definition of 'post process' so as not to be 'process post encoding'
[21:07] <iive> yes i do
[21:07] <iive> i mean post processing that hides encoding artifacts. like blocking and ringing.
[21:08] <Daemon404> gradfun can hide banding
[21:08] <Daemon404> thats what people use it for during playback
[21:08] <iive> while banding could be considered encoding artifact...
[21:08] <Daemon404> it de-facto *is*
[21:08] <Daemon404> common artefact of dct
[21:08] <Daemon404> on flat areas
[21:09] <iive> still, gradfin is not replacement for postprocessing filter, it could be used in addition to one.
[21:10] <Daemon404> it IS a postprocessing filter, by definition
[21:10] <Daemon404> just because libpp is a deblocker, doesnt mean thats all pp is
[21:10] <Daemon404> >the english language
[21:12] <iive> every filter applied in video chain is post processing, because it is processing post decoding
[21:12] <Daemon404> yes
[21:12] <iive> however postprocessing have its own meaning, more specific than the english word.
[21:12] <Daemon404> more specifcally, in this particular multimedia scene, it is applied to playback filters
[21:13] <Daemon404> [20:12] <@iive> however postprocessing have its own meaning, more specific than the english word <-- yes and the one you refer to you are taking from a set of filters that smply used the word since it was convenient
[21:13] <Daemon404> they were not named as such due to a definition postprocess
[21:13] <Daemon404> you have cause and effect mixed up here
[21:13] <iive> no, you ignore it, just to have something to argue about.
[21:14] <Daemon404> what the fuck?
[21:14] <iive> have a nice day.
[21:14] <Daemon404> ...every time you speak, you somethign fuckign retarded
[21:14] <Daemon404> for real
[21:15] <Daemon404> youre the only person in ere i get the urge to stab in the face on a regular basis
[21:15] <ubitux> haha
[21:17] <iive> wasn't Daemon404 the one explaining what decimation means, and how we should be using it like avisynth and not use it for what decimation word really means?
[21:18] <ubitux> that's indeed another topic where you fight with him
[21:25] <J_Darnley> You mean reducing by one tenth?
[21:25] Action: J_Darnley runs
[21:25] <iive> yes.
[21:25] <ubitux> "I asked the author, he apparently made the entire unrolled table by hand."
[21:26] <ubitux> holy shit.
[21:26] <J_Darnley> :O
[21:37] <jamrial> [NULL @ 00000000004c4880] [IMGUTILS @ 000000000032b140] Picture size 27792x16128
[21:37] <jamrial> probably a bit too big source for hqx4 :p
[21:39] <jamrial> sorry, full error was "Picture size 27792x16128 is invalid"
[21:40] <ubitux> :)
[21:42] <jamrial> source was http://lego.wikia.com/wiki/File:MegaManMix.png that, if you're curious
[21:43] <jamrial> hqx2 and hqx3 work, but ffmpeg eats ~2gb of ram
[21:47] <ubitux> aw. 
[21:47] <ubitux> strange, i'm not doing much with memory
[21:49] <wm4> <jamrial> sorry, full error was "Picture size 27792x16128 is invalid"
[21:49] <wm4> fixing this is hard
[21:50] <wm4> it's a big mess :(
[21:52] <nevcairiel> this one you could probably fix by simply taking bitdepth into account
[21:52] <nevcairiel> and not assuming 8 bytes per pixel
[21:53] <nevcairiel> but i guess the function doesnt know the bitdepth, and adding it is a API break
[21:53] <anshul> Michlani, page_segment and end_segment can be called multiple times in a single call to dvbsub_decode. This will not cause save_subtitle_set() to be called multiple times.      since that function is called optionaly   according to compute edt value
[21:53] <wm4> nevcairiel: yeah, that's a problem too
[21:54] <jamrial> i posted that as a curiosity. doubt anyone will actually have a legit use for hqx4 with 6948×4032 images
[21:57] <wm4> hm yes, but in general this is a problem
[21:58] <wm4> high res images can easily reach these resolutions
[21:59] <BBB> J_Darnley: ty!
[22:00] <BBB> the main difference appears to be that it saves more stuff in intermediate registers than straight from-memory operations
[22:00] <BBB> J_Darnley: so if thats true, then on a 32bit build, we should already be faster than the code that gcc-4.8.3 generates, only on 64bit is it faster, can you test that?
[22:02] <michaelni> anshul, if tehre are 2 DVBSUB_PAGE_SEGMENT, dvbsub_parse_page_segment will be called twice
[22:03] <michaelni> and it will call save_subtitle_set() twice if compute_edt == 1
[22:03] <michaelni> and if there are 2 DVBSUB_DISPLAY_SEGMENT, dvbsub_display_end_segment will be called twice
[22:03] <michaelni> and it will call save_subtitle_set() twice if compute_edt == 0
[22:09] <anshul> Michaelni,In my last patch I moved ffswap in dvbsub_decode but for memleak I will have to add more code
[22:10] <anshul>  I do need one sample having such segments, without them its difficult   for me.
[22:12] <J_Darnley> BBB: probably.  You want me to compare your patch vs. not on 32-bit cygwin, right?
[22:12] <michaelni> anshul, i understand but i dont have such sample
[22:12] <J_Darnley> that is basically 32 bit windows anyway
[22:15] <anshul> I am comparing data size and    ret values because if I dont do that way then parse_end_segment update datasize to 0 while actually it should be 1
[22:18] <anshul> To set datasize only in save_subtitle_set I have to put it in ctx or increase one parameter of some functions, before I was trying to solve with minimum changes.
[22:25] <michaelni> theres also the "got_segment == 15 && sub" case that overrides data_size, i think this too can cause it to become wrong
[22:25] <michaelni> before the patch this couldnt execute when the other case did i think
[22:34] <cone-471> ffmpeg.git 03Clément BSsch 07master:4d8fc0e08828: avfilter/hqx: unroll the pattern calculation
[22:34] <cone-471> ffmpeg.git 03Clément BSsch 07master:79198cb65a0b: avfilter/hqx: add some self promotion in doxy.
[22:39] <BBB> J_Darnley: yes
[22:41] <J_Darnley> 32-bit, cygwin x86, gcc 4.8.3
[22:41] <J_Darnley> without patch: 13.74 runtime, 535 decicycles/sample in float_resample, 1048484 runs, 92 skips
[22:41] <J_Darnley> with patch: 14.07 runtime, 543 decicycles/sample in float_resample, 1048448 runs, 128 skips
[22:42] <BBB> :(
[22:42] <BBB> weird
[22:42] <BBB> ok
[22:42] <J_Darnley> I also pulled in the latest changes from github
[22:42] <BBB> thanks, Ill look what it does
[22:42] <BBB> its clearly not just register pre-loading
[22:43] <J_Darnley> You want the objdump again?
[23:09] <cone-471> ffmpeg.git 03Carl Eugen Hoyos 07release/1.2:6011b806dd5d: Show duration for large asf files as written in the file header.
[23:09] <cone-471> ffmpeg.git 03Carl Eugen Hoyos 07release/2.1:f3802aa3250c: Show duration for large asf files as written in the file header.
[23:09] <cone-471> ffmpeg.git 03Carl Eugen Hoyos 07release/2.2:52572ca1b339: Show duration for large asf files as written in the file header.
[23:25] <cone-471> ffmpeg.git 03Michael Niedermayer 07master:04776cedec82: avfilter/avf_showspectrum: fix macro ()
[23:25] <cone-471> ffmpeg.git 03Michael Niedermayer 07master:84de3ed795d0: avfilter/deshake_opencl: fix macro ()
[23:28] <michaelni> ubitux, your fate valgrind client doesnt like your hqx filter http://fate.ffmpeg.org/report.cgi?time=20140621205440&slot=x86_64-archlinux-gcc-valgrindundef
[23:28] <ubitux> mmh
[23:29] <ubitux> that wouldn't be a problem in swscale right?
[23:30] <michaelni> could be, didnt really look at it
[23:31] <BBB> J_Darnley: hm& that might be useful yes
[23:31] <BBB> J_Darnley: theres definitely a few differences, I have to test how relevant each of them is
[23:32] <ubitux> michaelni: i wonder if it doesn't like the resolution change
[23:34] <ubitux> michaelni: that seems related to it indeed
[23:35] <ubitux> michaelni: and i can't reproduce with -cpuflags none
[23:35] <ubitux> "valgrind ./ffmpeg_g -i ~/fate-samples/filter/pixelart%d.png -vf hqx=3 -f null -" vs "valgrind ./ffmpeg_g -cpuflags none -i ~/fate-samples/filter/pixelart%d.png -vf hqx=3 -f null -"
[23:36] <ubitux> individual pixelart*png don't triggers it
[23:36] <ubitux> so it's related to the resolution change
[23:37] <michaelni> maybe its rescaling 
[23:37] <michaelni> from one res to another
[23:39] <michaelni> also it might be a false positive (not that i have evidence just saying as there sure is code that overreads)
[23:40] <michaelni> overread the odd size usig the padding that is
[23:40] <ubitux> i have no idea how the reconfiguration of sws works and is actually triggered
[23:41] <wm4> why would there be any scaling, other than hqx3 itself
[23:42] <ubitux> rgba ’ bgra maybe
[23:42] <ubitux> (png outputs rgba, filter takes bgra)
[23:43] <ubitux> it's strange that the issue is not reproducible with hq2x and hq4x though
[23:55] <ubitux> it's fun that it's not reproducible with -cpuflags none, but with optimizations the issue happens in yuv2bgra32_full_X_c
[00:00] --- Sun Jun 22 2014