[Ffmpeg-devel-irc] ffmpeg-devel.log.20140420

Mon Apr 21 02:05:02 CEST 2014

[00:24] <ubitux> BBB: still the loop filter
[00:24] <ubitux> well, i'm trying
[00:24] <ubitux> and probably doing it wrong :)
[00:55] <ubitux> ok so, now how am i suppose to repack that.
[00:56] <ubitux> packuswb(x,x) isn't doing what i want to
[01:18] <cone-325> ffmpeg.git 03Peter Ross 07master:f57ac37228b2: avformat/iff: extend IFF demuxer to decode DSDIFF 64-bit chunks
[02:18] <michaelni> Daemon404, do you have a testcase for "mjpeg: Do not fail jpeg decoding on bad EXIF data." ?
[02:19] <michaelni> iam curious about the bad exif data ...
[02:22] <Daemon404> hmm
[02:23] <Daemon404> pm'd you a link
[02:37] <BBB> ubitux: well the inverse of punpcklbw is packuswb
[02:37] <BBB> ubitux: so if you want to use packuswb, use punpcklbw
[02:37] <BBB> ubitux: I'm not saying the intermediate packing order makes sense, but it works as intended
[04:23] <michaelni> Daemon404, posted a patch that avoids the exif failure
[06:09] <cone-986> ffmpeg.git 03Lukasz Marek 07master:5053897b6ab2: lavd/xv: keep aspect ratio
[06:09] <cone-986> ffmpeg.git 03Lukasz Marek 07master:9fcdfac894b2: lavd/xv: add window id param
[06:09] <cone-986> ffmpeg.git 03Lukasz Marek 07master:de705e52d47f: lavd/xv: implement repaint message
[06:09] <cone-986> ffmpeg.git 03Michael Niedermayer 07master:cd4faed89378: Merge remote-tracking branch 'lukaszmluki/master'
[11:47] <ubitux> BBB: i'm trying to do packuswb(x, vextracti128(x, 1)) in one step
[11:47] <ubitux> i was assuming packuswb(x, x) would do, but it seems not
[12:28] <ubitux> ok so it went freaking ugly but i got the lpf working with avx2
[12:40] <ubitux> http://pastie.org/pastes/9095042/text so it's a bit better
[12:41] <ubitux> i can probably do much better in various places
[13:05] <BBB> lol
[13:05] <BBB> ubitux: how does it do speed-wise?
[13:05] <BBB> oh those are metrics
[13:05] <BBB> n/m
[13:05] <BBB> I thought they were a patch
[13:06] <BBB> hm 8 cycles faster is a little ... little?
[13:06] <BBB> I mean avx2 should have double throughput so you should be able to get double the speed right?
[13:06] <ubitux> ignore the first one
[13:07] <BBB> you did implement a mix4 function right?
[13:07] <BBB> that is a little tricky :)
[13:07] <ubitux> no i didn't really
[13:07] <ubitux> actually i should remove most of the avx2 func you see there
[13:07] <ubitux> it's only relevant with a few cases
[13:08] <ubitux> aka where filter6() and filter14() is called
[13:08] <ubitux> but i'm still wondering about all the threshold comparison
[13:08] <ubitux> i can probably reduce the number of compare call, but with the extract overhead
[13:10] <ubitux> BBB: i basically just made filter6() and filter14() exploit ymm regs
[13:23] <BBB> oh I see
[13:23] <BBB> ok, so here's the plan I had:
[13:23] <BBB> forget the dsp functions for a second
[13:23] <BBB> go to vp9.c
[13:24] <BBB> you know the basic loop filter alto in vp9 (libvpx) is 8-pixel based right?
[13:25] <BBB> line 3223
[13:26] <BBB> we're doing edges between blocks horizontally adjacent to each other
[13:27] <BBB> e.g. a | b
[13:27] <BBB> where | is the edge
[13:27] <BBB> if you look carefully, you'll see we do lflvl->mask[][][y] as well as [y+1], i.e. we do 2 lines per iteration
[13:28] <BBB> reason we do that is to do 16 pixels per iteration, instead of the basic 8 that vp9 gives us
[13:28] <BBB> we can extend this further, obviously, we can do 32 - you need to do some more of the special handling that you see below that to take care of the cases where y and y+1 don't use the same loop filter (e.g. mixing wd=4 and wd=8, or being on an edge)
[13:28] <BBB> for 32, that's somewhat harder, because you can mix 16 and 4 and 8, or 16 and empty and 8
[13:29] <BBB> so mix4 is a little bit more complex than mix2, but that gives you 32pixel basic loop filter entry points, that you can then exploit with avx2
[13:31] <BBB> so the cases you want to handle are probably 16|16, 16|n, n|16 and n|n where n is any of 8|8, 8|4, 4|8, 4|4, 4|none, none|4, 8|none, none|8
[13:32] <BBB> (as opposed to mix2, which is just 8|4, 4|8, 4|4 and 8|8)
[13:34] <BBB> there's also avx2 mc functions to be created btw, if you're in for more fun
[13:35] <BBB> I guess we can start with the filter4/6, it does actually help a fair bit, so that's certainly nice
[13:35] <ubitux> yeah i see what you want to do
[13:36] <ubitux> not sure i'll make much work on avx2, i really want to continue the arm thing actually :D
[13:36] <ubitux> i just wanted to have a little overview of what we can do with avx2
[13:36] <ubitux> and how that works
[13:37] <BBB> oh right arm
[13:37] <BBB> how far is that?
[13:37] <ubitux> not much from previously
[13:37] <ubitux> got busy with random things lately
[13:37] <ubitux> not much more* from previously
[17:21] <cone-68> ffmpeg.git 03Michael Niedermayer 07master:a94de50ba02f: avcodec/exif/exif_add_metadata: add support for SSHORT & SBYTE
[17:21] <cone-68> ffmpeg.git 03Michael Niedermayer 07master:e70b9b32d5ba: avcodec/exif: do not follow 0 offsets
[18:25] <cone-68> ffmpeg.git 03Michael Niedermayer 07master:69bbe27b45aa: avcodec/huffman: use av_malloc_array()
[18:25] <cone-68> ffmpeg.git 03Michael Niedermayer 07master:1fd5c7f1ee2f: avcodec/ratecontrol: use av_malloc_array()
[18:25] <cone-68> ffmpeg.git 03Michael Niedermayer 07master:8c88ea76df9b: avcodec/tiff: use av_malloc(z)_array()
[18:25] <cone-68> ffmpeg.git 03Michael Niedermayer 07master:de9cd5884822: avcdoec/webp: use av_malloc_array()
[20:35] <cone-68> ffmpeg.git 03Christophe Gisquet 07master:319235c67c59: vc1dsp: introduce cases for 8x8 and 16x16
[20:35] <cone-68> ffmpeg.git 03Michael Niedermayer 07master:af89a685c46b: avcodec/arm/vc1dsp_init_neon: fix code so it compiles and passes fate-vc1
[21:07] <cone-68> ffmpeg.git 03Lukasz Marek 07master:4d09bc98974d: lavf/pcm: remove redundant check
[21:29] <cone-68> ffmpeg.git 03Clément BSsch 07master:f0d368d75819: avcodec/x86/vp9lpf: merge a few movs with other instructions.
[21:34] <cone-68> ffmpeg.git 03Clément BSsch 07master:62d31307c1c5: avcodec/x86/vp9lpf: add a comment above a bunch of SWAP.
[00:00] --- Mon Apr 21 2014