[Ffmpeg-devel-irc] ffmpeg-devel.log.20131227

Sat Dec 28 02:05:02 CET 2013

[00:06] <Daemon404> nevcairiel: seems i forgot...
[00:06] <Daemon404> icl ships with gdb now
[00:06] <Daemon404> lol.
[01:28] <cone-783> ffmpeg.git 03Michael Niedermayer 07master:0875a9e4fc4c: avformat/oggparseogm: check input size before reading t
[01:28] <cone-783> ffmpeg.git 03Michael Niedermayer 07master:42b6805cc198: avcodec/huffyuvdec: clear remainder of the array on end of input in decode_422_bitstream()
[02:02] <BBB> ubitux: why signed? the most common method is +128 (or pxor 0x80; i.e. make unsigned), then pand (0xf8 or 0xf0) and then psrlq/d/w 3 or 4, and then in the end -128 (or pxor 0x80)
[02:03] <BBB> oh michael already said that
[02:03] <BBB> ok nevermind me
[02:55] <iive> BBB: why is the pand? if the operation is byte wise it shouldn't be needed.
[02:55] <iive> and i see you do give example with shift that is quad/double/word ...
[02:59] <iive> n8
[03:06] <BBB> because there is no bytewise shift
[03:06] <BBB> :)
[03:11] <cone-783> ffmpeg.git 03Michael Niedermayer 07master:f55bc96a5449: avcodec/pcm-dvd: reset last header on errors
[07:28] <ubitux> BBB: why signed; because filter2()/filter4() are signed op
[07:29] <ubitux> michaelni, BBB, ok thx will try that
[07:30] <ubitux> BBB: btw, about the MC func, any reason to prefer 8 reads vs 2 reads (-3 and +4) and palignr/pshufb?
[07:31] <ubitux> (at the beginning of the .loop)
[07:32] <ubitux> all the data should overlap, unless i'm missing sth?
[07:33] <ubitux> (for the horizontal one)
[09:47] <j-b> 'morning
[11:51] <richardus> there's no documentation for the -g flag on the ffmpeg-all page, should i file a big
[11:51] <richardus> bug
[11:51] <richardus> i'm having trouble searching trac for '-g' :p
[11:54] <nevcairiel> g is keyframe interval
[11:55] <cone-205> ffmpeg.git 03Michael Niedermayer 07master:e630ca511107: avformat/mpegts: check sl.timestamp_len
[12:18] <cone-205> ffmpeg.git 03Diego Biurrun 07master:b83d1ee3b41c: avutil: Move library version related macros to version.h
[12:18] <cone-205> ffmpeg.git 03Michael Niedermayer 07master:25b243759cb6: Merge commit 'b83d1ee3b41cfe8357836e2582104db2f3364cb0'
[12:42] <saste> richardus, g in the codec options chapter
[12:44] <BBB> ubitux: I tried palignr/pshufb, it was slower (see also vp8, it does the same thing, or 8px one, it does the same thing)
[12:44] <BBB> ubitux: don't know why tbh
[12:45] <ubitux> ah, okay&
[12:45] <BBB> I'd expect it to be faster b/c non-i/o but what do I know
[12:45] <BBB> maybe once it's cached it doesn't matter
[12:46] <BBB> also I'm working on 32x32 again now
[12:46] <BBB> (the sub4x4/sub2x2 made no speed diff so I left them aside for now)
[12:47] <ubitux> btw, srcq and src4q aren't aligned all the time?
[12:48] <cone-205> ffmpeg.git 03Stefano Sabatini 07master:8ea150187856: doc/protocols: fix level of udp examples subsection
[12:49] <BBB> no, because a ref pixel in mc can be at any position
[12:49] <BBB> (as opposed to a dst pixel, which is always blocksize-aligned)
[12:50] <ubitux> ok
[12:50] <ubitux> then i guess it LGTM, but i don't know MC enough :p
[12:51] <ubitux> the sub func in 16x16 LGTM too btw
[12:52] <ubitux> i can ofc reply on the ml but since you're here&
[13:10] <BBB> tnx
[13:10] <BBB> ok off to 32x32 then
[13:10] <cone-205> ffmpeg.git 03Luca Barbato 07master:9ace13db77a2: doxy: Fix link in badge color
[13:10] <cone-205> ffmpeg.git 03Michael Niedermayer 07master:8102cdfc04a5: Merge commit '9ace13db77a22fd59c217175596a95775c5d25aa'
[13:10] <BBB> ubitux: and the other one (sub8x8 I think) also ok?
[13:12] <ubitux> the other one?
[13:12] <ubitux> i'm talking about sub8x8 in 16x16
[13:25] <cone-205> ffmpeg.git 03Luca Barbato 07master:1ab91c7d4ac6: doxy: Update the css to have a flat style
[13:25] <cone-205> ffmpeg.git 03Michael Niedermayer 07master:7ad6515fd4b4: Merge remote-tracking branch 'qatar/master'
[13:32] <BBB> I'm sleepy, just ignore me, it's easier
[13:32] <ubitux> :)
[13:32] <BBB> I missed that whole setence
[13:32] <BBB> michaelni: merge of https://github.com/rbultje/ffmpeg/commits/vp9-simd please?
[13:32] <BBB> I'll work on a 32x32 skeleton
[13:33] <BBB> maybe in ~2 weeks or so that's finished
[13:33] <BBB> how's lf?
[13:33] <ubitux> in progress
[13:33] <BBB> it's 27.4% of ffmpeg runtime here
[13:33] <ubitux> https://github.com/ubitux/FFmpeg/compare/vp9-lpf
[13:33] <BBB> where total video is like 70-80% or so
[13:33] <BBB> so that's like 1/3rd
[13:34] <ubitux> i'm doing filter2()/filter4()
[13:34] <ubitux> filter14() is started
[13:34] <ubitux> filter6() not started
[13:34] <ubitux> rest of the function should be "done"
[13:35] <ubitux> the flow can be done in different orders, so it's a bit chaotic right now
[13:35] <ubitux> i'm trying to organize it so i can get enough reg when i need it
[13:36] <ubitux> right now, i'm going for 1) calc flat8out/flat8in/hev 2) filter2()/filter4() 3) filter6() 4) filter14()
[13:36] <BBB> yeah that looks good
[13:37] <BBB> I mean we'd have to see the final assembly for the more specific comments, but the organization looks logical
[13:38] <ubitux> i really enjoy writing that code btw
[13:38] <ubitux> more than idct, because it looks like there are a lot of solution
[13:38] <ubitux> it's somehow way more flexible
[13:39] <BBB> that's the good and the bad part, right?
[13:39] <BBB> it's more compartmentalized
[13:39] <ubitux> i think it's a good thing :)
[13:39] <BBB> but that also means you sometimes run the risk of doing the same calculation multiple times
[13:39] <ubitux> i like that :p
[13:40] <BBB> and for the perfect simd, you want to prevent that, but then you sometimes lose the compartmentalization again
[13:40] <BBB> if you separate it too much, it becomes hard to do just that ;)
[13:40] <ubitux> probably
[13:40] <ubitux> but as you told me, it's indeed not really important
[13:40] <ubitux> because on a whole 16x16
[13:40] <BBB> so the final code tends to be somewhat "messy" because it started pretty (look at vp8) and then I tried to do the final few instruction losses and then it got a little ugly at times
[13:41] <ubitux> you basically can trigger every path
[13:41] <ubitux> (even with branching i mean)
[13:41] <ubitux> so i don't think it really matters
[13:41] <BBB> yes
[13:42] <ubitux> i'm looking forward the steps where it will work so i can start making it a spaguetti soup
[13:42] <ubitux> :)
[13:42] <BBB> e.g. limit() and fhev() both use abs(p1-p0) and abs(q1-q0)
[13:42] <BBB> so you don't want to do that twice
[13:42] <BBB> in vp8
[13:42] <BBB> vp9 might have something similar, I Don't remember
[13:43] <ubitux> i reused a lot such thing already
[13:43] <BBB> that's the sort of stuff where you can lose a few instructions but then it becomes a little bit intermingled
[13:43] <BBB> ok cool
[13:43] <BBB> then it'll be great
[13:43] <ubitux> https://github.com/ubitux/FFmpeg/compare/vp9-lpf#diff-651121e4f3585d617f2de38cdfbcc3adR155
[13:43] <ubitux> i'm computing flat8in and hev at the same time here for instance
[13:43] <BBB> ah yes
[13:43] <BBB> nice
[13:43] <ubitux> and i'm also reusing a lot the previous calculation
[13:43] <BBB> well I'm glad you like it, I remember going quite crazy with lf at the end
[13:44] <ubitux> :D
[13:44] <ubitux> what i'm afraid of is not having enough reg for the filter14()
[13:44] <ubitux> but i've a B plan @_@
[13:45] <ubitux> anyway, it will take me a little more time
[13:45] <ubitux> i'll keep you up to date :p
[13:45] <nevcairiel> not even enough regs on 64-bit? :p
[13:45] <ubitux> not really
[13:45] <ubitux> :(
[13:46] <ubitux> as the name suggests, filter_14() needs 14 lines
[13:46] <ubitux> of course it's possible to re-read N times
[13:47] <ubitux> but i'm trying to reuse a maximum of values
[13:47] <BBB> I think it's possible, because you need some pixels only for 2 things
[13:47] <BBB> for the flat
[13:47] <BBB> and then for the actual filter
[13:47] <BBB> so you can prioritized these calculations before anything else
[13:47] <BBB> and then drop that register
[13:47] <BBB> also for intermediates, feel free to use stack, vp8 does that too on x86-32
[13:48] <BBB> using a bit of stack is fine for such things
[13:48] <ubitux> no i don't need the stack
[13:48] <BBB> ok that's even better
[13:48] <ubitux> the plan was to compute the other small filters so i can free the flat8in & hev and just keep the final mask
[13:49] <ubitux> but the filters requires some cache
[13:49] <ubitux> typically
[13:49] <ubitux> dst[stride * -7] = (p7 + p7 + p7 + p7 + p7 + p7 + p7 + p6 * 2 + p5 + p4 + p3 + p2 + p1 + p0 + q0 + 8) >> 4;
[13:49] <ubitux> then
[13:49] <ubitux> dst[stride * -6] = (p7 + p7 + p7 + p7 + p7 + p7 + p6 + p5 * 2 + p4 + p3 + p2 + p1 + p0 + q0 + q1 + 8) >> 4;
[13:50] <ubitux> here i'm reusing the previous calculation without the shift
[13:50] <ubitux> and i'm substracting p7 and p6, and adding p5
[13:50] <ubitux> it's nice to have those cached
[13:51] <ubitux> also unpack/repack is taking twice the amount of reg
[13:51] <ubitux> so it's a bit short sometimes
[13:51] <ubitux> but it should be possible :)
[13:52] <ubitux> the filter2()/filter4() is not fun though :(
[14:05] <cone-205> ffmpeg.git 03Ronald S. Bultje 07master:0d9375fc908c: vp9/x86: 16x16 sub-IDCT for top-left 8x8 subblock (eob <= 38).
[14:05] <cone-205> ffmpeg.git 03Ronald S. Bultje 07master:18175baa54ea: vp9/x86: 16px MC functions (64bit only).
[14:05] <cone-205> ffmpeg.git 03Michael Niedermayer 07master:c09bb235bf25: Merge remote-tracking branch 'rbultje/vp9-simd'
[14:11] <ubitux> BBB: you still haven't uploaded ped1080p.webm btw?
[14:13] <ubitux> i realized that the etv* ones are actually pretty bad quality wise
[14:13] <ubitux> even the recent etv5k have quite some blocks
[14:15] <ubitux> http://i.imgur.com/ogvjRtV.png  like such area
[14:15] <ubitux> (it's even worse in etv.webm and etv5000.webm)
[14:32] <BBB> ubitux: not yet, will do this weekend
[14:32] <ubitux> BBB: did you see my other regression crash btw? :p
[14:32] <BBB> yes
[14:32] <BBB> will look
[14:32] <BBB> filter2/filter4 is similar to vp8, so feel free to look at them
[14:33] <BBB> but yes they suck a bit
[16:50] <funman> How can I debug this further? http://pastie.org/8580267
[16:50] <funman> ffmpeg version 2.1.1
[16:53] <ubitux> check with the cpuflags
[16:53] <ubitux> like, is it reproducible with -cpuflags -mmx?
[16:53] <funman> hm well I'm using vlc
[16:54] <ubitux> you can't reproduce with ffmpeg/ffplay?
[16:54] <funman> didn't try yet
[16:54] <funman> I'll remove SWS_CPU_CAPS_MMX for now
[16:58] <iive> funman: would you dissassemble a little around the point at fault?
[16:58] <iive> actually, there is .c line number.
[16:59] <ubitux> which is an inline asm macro
[16:59] <funman> http://pastie.org/8580299
[17:00] <funman> http://pastie.org/8580308
[17:01] <ubitux> pointer looks valid
[17:02] <ubitux> maybe some overread (odd or weird width maybe?), or a negative linesize somewhere (use of filters like vflip?)
[17:06] <funman> http://pastie.org/8580333
[17:06] <funman> that's the context
[17:09] <funman> hmm that's a C# app and mono doesn't like valgrind
[17:11] <ubitux> VLC is a C# & mono app now?
[17:12] <funman> you mean swscale is a C# & mono app
[17:13] <funman> vlc has (several) C# bindings and mono is required to run C# afaik
[17:14] <funman> ok I found the culprit
[17:14] <funman> it's kodab
[17:14] <funman> using width instead of coded_width gives me no crash anymore
[17:15] <funman> although I wonder why I can't disable mmx code
[17:16] <funman> cpu flags aren't given through getContext() anymore?
[17:18] <ubitux> av_*_cpu_flags()?
[17:18] <funman> how can I change that from vlc?
[17:19] <funman> I see there's an av_parse_cpu_flags
[17:19] <ubitux> av_force_cpu_flags()?
[17:19] <funman> but sws_init_context uses av_get_cpu_flags()
[17:19] <funman> ah right
[17:19] <ubitux> see how opt_cpuflags() work in cmdutils.c
[17:21] <funman> alright now it segfaults in C code
[17:21] <ubitux> should be easier to debug :p
[17:21] <ubitux> and you know it's a problem in the caller then
[17:21] <funman> yeah
[17:22] <funman> although it's still weird
[17:28] <ubitux> funman: it's a regression?
[17:28] <funman> in VLC yes
[17:29] <funman> introduced by http://git.videolan.org/?p=vlc.git;a=commitdiff;h=b71c85b3d88b8d0ad2d4a63bf58ebcd2ad771cbf
[17:32] <funman> and that particular one is fixed by http://git.videolan.org/?p=vlc.git;a=commitdiff;h=983198b0fc68d2e2bcfe98b6fe331964b1bc9f9e
[17:32] <funman> i_width = codec_width, i_visible_width = width
[17:33] <iive> there is something I don't seem to get
[17:33] <iive> the fault is in the last instructions of RGB_PACK16().
[17:34] <iive> they move %mm0 into (%1). but isn't %1 src[] ?
[17:34] <iive> shouldn't it be writing packed rgb into the dst?
[17:35] <ubitux> one width is larger than the other
[17:35] <ubitux> afaiu
[17:35] <ubitux> overread, boom
[17:37] <iive> look at the fault instruction, is is write.
[17:38] <iive> it is write.
[17:39] <ubitux> dstw = srcw and src2w > srcw maybe?
[17:40] <ubitux> overreading in src might be fine since it's actually src2w large
[17:40] <ubitux> but dstw might have a width of the smallest announced size
[17:45] <ubitux> iive: ah sorry, misread what you said; i guess it's an inplace thing :p
[17:46] <iive> but... Y8->rgb16?
[17:46] <iive> or rather yuv420/yv12->rgb16
[23:23] <PwrSurge> hi, anyone here working on MIPS optimization features?
[23:23] <ubitux> hi,
[23:23] <ubitux> better contact Nedeljko Babic first
[23:24] <ubitux> he's the MIPS maintainer, and regularly send a large bunch of optimizations with some other mips folks
[23:26] <PwrSurge> i'm doing embedded development on a mipsel board
[23:27] <PwrSurge> was hitting my head against my desk as I could simply not understand why I kept getting "Illegal instruction" on my code after compiling for a new kernel version
[23:27] <PwrSurge> lol
[23:27] <PwrSurge> for a while, thought it was a bug in my new code
[23:29] <PwrSurge> debugged with GDB and found out it was crashing when trying to call libavcodec.so
[23:30] <PwrSurge> Program terminated with signal 4, Illegal instruction.
[23:30] <PwrSurge> #0  0x76ea9f34 in ?? () from /data/buildsystem-cs/root/lib/libavcodec.so.55
[23:44] <cone-820> ffmpeg.git 03Michael Niedermayer 07master:4156df59f596: avformat/mov: check avio_read() return in mov_read_dref()
[00:00] --- Sat Dec 28 2013