[Ffmpeg-devel-irc] ffmpeg-devel.log.20140125

Sun Jan 26 02:05:02 CET 2014

[00:52] <BBB> ubitux: fixed the valgrind stuff
[01:23] <BBB> ubitux: and that other patch fixes fuzzed2.ivf - I believe it fixes the others as well but please double check, I'm having issues reproducing reliably for some reason
[01:31] <BBB> michaelni: and for you, a mergeable branch: https://github.com/rbultje/ffmpeg/commits/vp9-simd
[01:32] <cone-484> ffmpeg.git 03Wim Vander Schelden 07master:af09be4f4b2f: Fixed a memory leak in dvbsubenc.c: sub->num_rects was reduced without freeing the associated rects.
[01:32] <cone-484> ffmpeg.git 03Michael Niedermayer 07master:cf812d812967: avcodec/dvbsubdec: Remove unused display_list_size
[01:39] <cone-484> ffmpeg.git 03Kostya Shishkov 07master:0e1ad2f591b8: dxtory: add more compressed and uncompressed modes
[01:39] <cone-484> ffmpeg.git 03Michael Niedermayer 07master:2d0d1f7eb3f7: Merge commit '0e1ad2f591b87e944550c15b54e54f8189743289'
[01:43] <cone-484> ffmpeg.git 03Kostya Shishkov 07master:28e1eed3c2e7: dxtory: compressed RGB555/RGB565 decoding support
[01:44] <cone-484> ffmpeg.git 03Michael Niedermayer 07master:4b84a69ebb35: Merge remote-tracking branch 'qatar/master'
[01:51] <cone-484> ffmpeg.git 03Ronald S. Bultje 07master:baf47020cd23: vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants.
[01:51] <cone-484> ffmpeg.git 03Ronald S. Bultje 07master:d43efa68bd53: vp9/x86: 4x4 iadst SIMD (ssse3) variants.
[01:51] <cone-484> ffmpeg.git 03Ronald S. Bultje 07master:97474d527f9a: vp9/x86: iwht4x4 (lossless) mmx.
[01:51] <cone-484> ffmpeg.git 03Ronald S. Bultje 07master:c9e6325ed984: vp9/x86: use explicit register for relative stack references.
[01:51] <cone-484> ffmpeg.git 03Ronald S. Bultje 07master:4147b337c105: vp9: fix memory corruption if header decoding fails after size change.
[01:51] <cone-484> ffmpeg.git 03Michael Niedermayer 07master:5554c6dd455c: Merge remote-tracking branch 'rbultje/vp9-simd'
[02:50] <BBB> ubitux: https://github.com/rbultje/ffmpeg/commits/vp9-coef-opts for review
[02:50] <BBB> ubitux: basically some coef reading and zero writing optimizations
[02:57] <BBB> (renamed to vp9-context-opts)
[05:00] <cone-484> ffmpeg.git 03Ramiro Polla 07master:222fb8276dc4: lavfi/drawtext: get bitmap from glyph in a separate step
[05:00] <cone-484> ffmpeg.git 03Ramiro Polla 07master:78a9f185eb17: lavfi/drawtext: add option for drawing border around text
[11:58] <ubitux> BBB: https://github.com/rbultje/ffmpeg/compare/vp9-context-opts ?
[11:58] <ubitux> ok
[11:58] <ubitux> didn't see the rename
[11:58] <ubitux> will test
[11:58] <ubitux> (the valgrind stuff)
[12:15] <ubitux> BBB: all fuzzed files working here :)
[12:15] <ubitux> thanks
[12:20] <ubitux> BBB: want another one? :)
[12:37] <BBB> hum, that bad?
[12:37] <BBB> sure
[13:29] <saste> michaelni, any insight about the codec time_base thing?
[14:30] <ubitux> michaelni: https://github.com/ubitux/FFmpeg/compare/master...lossless opinion?
[15:02] <michaelni> saste, i suspect some code doesnt expect that the timebase has been overridden
[15:02] <michaelni> also ticks_per_frame probably would need to be set
[15:22] <michaelni> ubitux, nice
[15:35] <ubitux> heh, just won about 10 cycles in lpf 16x16 vert
[15:37] <ubitux> 3931 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 4193221 runs, 1083 skips
[15:37] <ubitux> ’ 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3, 4193594 runs, 710 skips
[15:37] <ubitux> \o/
[15:41] <wm4> what's a decicycle anyway?
[15:41] <ubitux> a tens of cycle?
[15:42] <wm4> how can there be 0.1 cycles?
[15:45] <kierank> wm4: it's an average
[15:45] <kierank> http://git.videolan.org/?p=ffmpeg.git;a=blob;f=doc/examples/filtering_video.c;h=67c8a8bc104d61c2a13bd424a2796fef18b0a7c0;hb=HEAD#l136
[15:45] <kierank> why is the input a sink?
[15:49] <wm4> hm
[15:49] <BBB> wm4: average of many runs
[15:49] <BBB> ubitux: cool
[15:50] <wm4> BBB: I always thought the minimum of all runs is the most useful value
[15:50] <ubitux> BBB: and i found a way to make the asm simpler :p
[15:50] <BBB> cool
[15:50] <wm4> kierank: maybe it's misnamed? the buffer src is the input, buffer sink the output...
[15:52] <wm4> oh I see
[15:52] <wm4> the output is basically a filter pad which outputs something, and which is thus linked to an input pad of the newly created filters
[15:53] <wm4> so the buffer src is an output, because it provides an output
[15:58] <ubitux> BBB: https://github.com/ubitux/FFmpeg/compare/vp9-simd
[15:59] <ubitux> and btw, this leads to a < 620 lines file
[16:00] <kierank> wm4: any idea what filter_descr is?
[16:01] <kierank> and how it is different to the parameters you pass to the filters directly
[16:01] <wm4> looks like it is the graph
[16:01] <wm4> it's the string passed to the -vf option on the command line
[16:01] <wm4> oh wait this is the example
[16:01] <kierank> yeah
[16:01] <wm4> but the same code is in ffplay
[16:01] <wm4> copy pasted, in fact
[16:02] <ubitux> kierank: saste once told me why it was that way
[16:02] <ubitux> but i forgot since it's completely unintuitive
[16:02] <wm4> so the buffer src, sink are for putting in/receiving AVFrames
[16:02] <ubitux> :(
[16:02] <wm4> and the format parameter is to force output configuration (i.e. what buffer sink will receive)
[16:02] <wm4> oh
[16:02] <wm4> the other way around
[16:03] <wm4> it's for forcing input
[16:03] <wm4> whatever
[16:15] <ubitux> michaelni: fate is covering that lossless code?
[16:15] <ubitux> (the asm)
[16:18] <michaelni> ubitux, it should
[16:18] <ubitux> ok, so i can push? :P
[16:18] <michaelni> yes
[16:27] <cone-652> ffmpeg.git 03Clément BSsch 07master:cddbfd2a9554: x86/lossless_videodsp: simplify and explicit aligned/unaligned flags
[16:27] <cone-652> ffmpeg.git 03Clément BSsch 07master:5267e850563d: x86/lossless_videodsp: use common macro for add and diff int16 loop.
[16:27] <cone-652> ffmpeg.git 03Clément BSsch 07master:5f4d04d08470: x86/lossless_videodsp: silly one-line cosmetic.
[16:33] <cone-652> ffmpeg.git 03Michael Niedermayer 07master:018e2b57ca83: avcodec/libx264: also consider ticks per frame for fps/timebase setup
[17:00] <jnvsor> I have a fate test failing that's giving an MD5 diff but has the same CRC - I changed timecode.c to use fmod instead of % to support fractional FPS
[17:00] <kierank> ???
[17:00] <kierank> fractional fps
[17:01] <jnvsor> As in 29.97
[17:01] <kierank> that's a terrible idea
[17:01] <kierank> use drop frame for that
[17:01] <jnvsor> timecode.c uses int instead of AVRational, and it's hardcoded to drop frames so if you actually do want 30fps you'll get desync
[17:02] <jnvsor> I would have thought AVRational would be the "right" way to do it
[17:02] <kierank> it only uses drop frame if you ask it to
[17:03] <jnvsor> So, timecode is supposed to be limited to specific framerates and I should look for this bug in the filter instead?
[17:04] <kierank> yes
[17:05] <jnvsor> Right, thanks for the info
[17:05] <cone-652> ffmpeg.git 03Lars Kiesow 07master:7fc4c1846300: Factors for scale filter
[17:06] <cone-652> ffmpeg.git 03Lars Kiesow 07master:e395f8de5ac6: Fixed factor for scale filter
[17:06] <cone-652> ffmpeg.git 03Lars Kiesow 07master:69b1d1d99bc5: Documentation for scale filter factor
[17:06] <cone-652> ffmpeg.git 03Lars Kiesow 07master:c49b0360966d: Documentation for scale filter factor
[17:06] <cone-652> ffmpeg.git 03Michael Niedermayer 07master:682ddb89cf15: Merge branch 'scale-filter-factor' of https://github.com/lkiesow/FFmpeg
[17:06] <cone-652> ffmpeg.git 03Michael Niedermayer 07master:1e48c39ece3e: avfilter/vf_scale: do aspect ratio and scale factor compensation together
[17:06] <cone-652> ffmpeg.git 03Michael Niedermayer 07master:214a3b8bf939: avfilter/vf_scale: simplify alignment code
[17:09] <ubitux> oh that's pretty cool
[17:11] <ubitux> > It is a DOS program from 1998
[17:11] <ubitux> we definitely need to support that app :D
[17:11] <Daemon404> dos in 1998?
[17:11] <Daemon404> half a decade late
[17:27] <ubitux> BBB: i have both 88_16 working, will improve and share soon
[17:47] <ubitux> 9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips
[17:47] <ubitux> 9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips
[17:47] <ubitux> to:
[17:47] <ubitux> 1932 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194162 runs, 142 skips
[17:47] <ubitux> 3058 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193877 runs, 427 skips
[17:47] <nevcairiel> its no longer over 9000 :(
[17:48] <ubitux> :(
[17:49] <ubitux> BBB: https://github.com/ubitux/FFmpeg/compare/vp9-simd
[17:53] <ubitux> 5.994 ’ 5.478 overall decode time on ped1080p.webm
[17:53] <ubitux> (-threads 1)
[17:54] <JEEB> nice
[18:01] <kierank> av_frame_free and avcodec_free_frame are not confusing at all
[18:01] <wm4> with git master, you can just always use av_frame_free AFAIK
[18:02] <wm4> with earlier releases, it depends on the situation, and doing the wrong thing may lead to memory leaks or corruption
[18:02] <ubitux> kierank: see http://git.videolan.org/?p=ffmpeg.git;a=blob;f=doc/examples/demuxing_decoding.c;hb=HEAD
[18:02] <ubitux> somehow, the 3 methods are depicted here
[18:02] <wm4> ubitux: that's even more confusing
[18:02] <ubitux> none should leak/corrupt anything currently, but maybe something broke again
[18:02] <wm4> it's ridiculous and everyone will hate ypou for it
[18:03] <ubitux> wm4: wtf?
[18:03] <ubitux> it shows how the old code was supposed to look like
[18:03] <wm4> yes that will be the reaction of anyone reading this example
[18:03] <ubitux> and how it's supposed to be done currently
[18:03] <nevcairiel> the new and old api are synonyms now, so you could even mix-match them if you want
[18:03] <ubitux> no
[18:03] <wm4> ffmpeg is insanely hard to use right, and these verbose, long winded examples demonstrating decprecated features don't help
[18:04] <ubitux> seriously
[18:04] <ubitux> well whatever
[18:04] <wm4> that's just how it is
[18:04] <ubitux> i remember people complaining about how that was supposed to be done because anton didn't document the migration from one method to another
[18:04] <nevcairiel> ubitux: the old api functions just call the new ones now, they have no single line of specific code anymore
[18:04] <ubitux> this example shows how that's done, and allow testing it..
[18:05] <ubitux> nevcairiel: that's what you meant, ok
[18:06] <wm4> sounds like anton is to blame, but personally I'm just glad the difference between these was removed
[18:06] <wm4> as soon as libav 10 is released I can dump all my old code and stop thinking about it
[18:06] <wm4> (until the next api transition)
[18:06] <nevcairiel> released? dont you mean adopted by distros? :d
[18:07] <ubitux> http://pastie.org/pastes/8666840/text 'seems lpf is no longer that much performance critical
[18:10] <jnvsor> How do I get the output of a specific fate test if there are no errors so it doesn't generate a .err file?
[18:11] <ubitux> make fate-thetest V=1
[18:11] <ubitux> re-run the command manually
[18:11] <jnvsor> That works I guess
[18:22] <ubitux> decode_coeffs_b is certainly slow..
[19:05] <BBB> ubitux: oh whoa good work
[19:28] <BBB> ubitux: so for the 88_h, you don't need a 16x16 transpose right? you should just need a 16x8
[19:29] <ubitux> ah?
[19:30] <BBB> ubitux: and filter4 in line 488, that's filter14 right?
[19:30] <BBB> oh no it's filter4 nm
[19:31] <BBB> yeah so my biggest comment would be that I think it should be possible to get away with a 16x8 transpose instead of a 16x16 for the 88
[19:33] <BBB> what is mask_mix?
[19:34] <ubitux> BBB: for splatting the 2 values of E, I and H
[19:34] <BBB> oh right I remember now
[19:35] <BBB> I don't think they have to be dq right? it's really just times 8 db 0, times 8 db 1
[19:35] <BBB> (not that it matters)
[19:35] <ubitux> yeah sure
[19:35] <BBB> I'd add a one-line comment with what you just said
[19:35] <BBB> I had to scratch my head there several times because I'm forgetful
[19:35] <ubitux> i wasn't sure if i could mix multiple times like that when doing the first time
[19:37] <BBB> my context-opts gains a few %, like 2-3% or so
[19:37] <BBB> decoding time is now halfway 6.0sec and 6.1sec
[19:38] <BBB> (so with your changes that'd be like 5.low)
[19:38] <BBB> I'll add a few more changes that I had planned at some point soon
[19:56] <ubitux> BBB: any idea how we could make decode_coeffs_b() faster?
[21:35] <BBB> ubitux: you can't, it just takes long
[21:35] <ubitux> sadness.
[21:35] <BBB> ubitux: you can make it 10-20% faster by writing it in assembly, but I assure you it's no fun
[21:35] <ubitux> :D
[21:35] <BBB> mru did that for arm, where every cycle counts
[21:35] <BBB> I never bothered to write a x86 version
[21:36] <ubitux> for what codec?
[21:36] <BBB> vp8
[21:36] <nevcairiel> disassemble gccs result and start optimizing there? :d
[21:36] <ubitux> ok :)
[21:36] <BBB> I was planning to move the tx32x32 division out of the loop
[21:36] <BBB> that would make a small difference
[21:36] <BBB> but other than that I have no ideas on how to optimize it by a big amount, just small 1-2 cycles differences
[21:36] <BBB> I can give you my list if you want
[21:37] <ubitux> BBB: btw, about the 16x8, i'll need to read 16 "lines" (actually 16 half-lines), i'm not sure the i can simplify the transpose itself much
[21:37] <BBB> - coef parsing opts:
[21:37] <BBB>   o make cache i instead of rc ordered (means having to change nb to be i indexed)
[21:37] <BBB>   o move division to decode_coeffs outside txblk loop
[21:37] <BBB>   o merge eob and cnt
[21:37] <BBB>   o remove tx argument
[21:37] <BBB>   o do DC outside loop and remove indexing of qmul (?)
[21:37] <BBB>   o move c on stack
[21:37] <BBB> 16 half lines = 16 movhs right?
[21:37] <BBB> then the first stage of the transpose is much simpler
[21:38] <BBB> it's just punpcklbw
[21:38] <BBB> instad of SBUTTERFLY bw
[21:38] <BBB> (which is punpckl+hbw
[21:38] <BBB> )
[21:38] <ubitux> ah, i see
[21:38] <BBB> then after that it's a 8-register transpose
[21:38] <ubitux> mmh
[21:38] <BBB> same for the reverse
[23:12] <ubitux> beastd: so, when are we putting back this awesome logo? http://web.archive.org/web/20050227054954im_/http://ffmpeg.sourceforge.net/ffmpeg-logo-p1.jpg
[23:14] <beastd> ubitux: yay \o/  that one is classic :D
[23:15] <ubitux> :)
[23:15] <beastd> Maybe we should put is up at the next ffmpeg anniversary
[23:29] <cbsrobot> ubitux: I thought I send the true peaks before march 2014
[23:53] <ubitux> cbsrobot: oh :)
[23:54] <cbsrobot> ubitux: I remember you liked the soundtrack I sent you for testing
[23:54] <ubitux> yes
[23:54] <cbsrobot> here's the movie that goes with it: https://vimeo.com/43986997
[00:00] --- Sun Jan 26 2014