[FFmpeg-devel-irc] IRC log for 2010-07-15

Fri Jul 16 02:00:01 CEST 2010

[00:01:00] <Dark_Shikari> oh cool, if I "unrolled" the branches like crazy, I get a pretty good speed boost.
[01:03:35] <kierank> did something change in the fft recently?
[01:04:29] <drv> avfft.c has a few recent changes, don't see anything else at a glance
[01:17:07] <kierank> hmmm i could swear my output matched octave before i rebased
[01:29:16] <peloverde> nothing except trivial API fixes has changed since march
[01:33:15] <kierank> octave's ifft of the identity vector outputs an array with each element equal to: 1/size  but mine for some reason just outputs 1 for each value
[01:33:57] <peloverde> IIRC that is the way our ifft has always behaved
[01:34:16] <kierank> i see
[02:17:08] <Dark_Shikari> x264 now supports NV12 input.  Do we care enough to add it to ffmpeg?
[02:17:14] <Dark_Shikari> Is there any decoder that outputs NV12?
[02:18:10] <drv> i think there's at least nv12 support in swscale
[02:19:44] <Dark_Shikari> technically there is, but it takes the slow path
[02:19:52] <Dark_Shikari> e.g. the only converter does a slow C YV12 -> NV12
[02:19:57] <Dark_Shikari> so you have to do two steps, and it's not optimized
[02:20:04] <Dark_Shikari> so the only worthwhile case would be if the video was _already_ nv12
[02:20:07] <Dark_Shikari> and we wanted to avoid a conversion.
[02:26:10] <pengvado> nothing in libavcodec touches nv12
[02:27:10] <Dark_Shikari> not even the gpu decode stuff?
[02:29:33] <drv> there's an entry for it in raw.c, but that's it
[02:30:48] <Dark_Shikari> heh.
[02:50:17] <siretart>    
[02:50:32] <Dark_Shikari>   
[06:23:42] <av500> peloverde: what do I need to do to be webm supporter?
[06:24:52] <peloverde> I don't know, two people have asked on webm-discuss
[06:25:21] <peloverde> In one case jk forwarded it to the webmaster, in the other there was no response
[06:26:10] <saintdev> lol @ MN
[06:39:32] <kshishkov> av500: loosing your mind should be the first step
[06:40:45] <av500> I'm still undecidecd whether being listed on that page is free advertisement or invitation to be sued...
[06:43:34] <av500> (not speaking for ffmpeg, ffmpeg should be added there imho)
[06:45:40] <mru> I doubt we'd become any bigger a target by being on that list
[06:46:01] <CIA-99> ffmpeg: mru * r24243 /trunk/libavcodec/arm/h264dsp_neon.S: ARM: remove two insns from NEON chroma loop filter
[06:46:11] <av500> mru: not speaking about ffmpeg
[06:46:25] <mru> yes, you said that
[06:46:25] <av500> I meant for a commercial company
[06:46:39] <av500> I have no fear at all for ffmpeg...
[06:46:45] <av500> you cannot sue SW
[06:46:49] <mru> quite
[06:47:10] <mru> and as far as we know, nobody has ever been sued for using ffmpeg
[06:47:33] <av500> you dont sue ppl for using ffmpeg, you sue them for not paying patent royalties
[06:48:57] <mru> some people are afraid of the RE'd stuff
[06:49:41] <av500> yeah, being sued by a 10y old defunct game publisher... :)
[06:49:52] <av500> or by Lego... but that is almost cool
[06:50:10] <av500> and, you can always limit the range of codecs and containers
[06:50:18] <mru> for using pcm, hmm
[06:50:22] <av500> we don't ship a full build of course
[06:50:35] <mru> you have no reason
[06:59:17] <astrange> not until someone patents av_malloc
[07:07:28] <mru> or wrappers in general
[07:25:46] <jai> is the dv maintainer around nowadays?
[07:25:58] <kshishkov> generally not
[07:26:05] <kshishkov> and hardly on IRC
[07:26:18] <kshishkov> I'm not sure if I've ever seen him here
[07:26:34] <jai> ah
[07:26:59] <mru> he was at linuxtag 2006
[07:27:32] <av500> vortex manipulator anybody?
[07:27:44] <jai> ffdv doesnt respond very well to fuzzed file
[07:27:48] <jai> *files
[07:27:54] <av500> ffuzzed?
[07:28:04] <jai> zzufd :)
[07:28:06] <mru> the dv code is very fragile
[07:28:07] <kshishkov> fix it
[07:28:16] <mru> it's always the first to fail on obscure systems
[07:28:22] <mru> along with dnxhd
[07:28:54] <kshishkov> rv40?
[07:29:13] <thresh> kshishkov: want to rofl? http://www.securitylab.ru/news/395807.php
[07:29:28] <jai> yeah, once i get some time hopefully, i'll try and fix some
[07:29:38] <jai> found some memleaks as well
[07:29:45] <thresh> ROC is so funny.
[07:29:55] <av500> thresh: conservative politicians do the same here all the time....
[07:30:11] <kshishkov> thresh: Ð³Ñ€ÐµÑˆÐ½Ð¾ ÑÐ¼ÐµÑÑ‚ÑŒÑÑ Ð½Ð°Ð´ ÑƒÐ±Ð¾Ð³Ð¸Ð¼Ð¸
[07:30:32] <av500> they dont sport that mighty beards though...
[07:32:29] <thresh> loving that: ÐŸÑ€Ð¸Ð»ÐµÑ‚Ð¸Ñ‚, Ð½Ð°Ð¿Ñ€Ð¸Ð¼ÐµÑ€, Ð½ÐµÐ¾Ð¶Ð¸Ð´Ð°Ð½Ð½Ð¾ ÐºÐ°ÐºÐ°Ñ-Ð½Ð¸Ð±ÑƒÐ´ÑŒ ÐºÐ¾Ð¼ÐµÑ‚Ð° Ð¸ ÑÐ²Ð¾Ð¸Ð¼ Ð¼Ñ‹ÑˆÐ¸Ð½Ñ‹Ð¼ ÐºÐ¾ÑÐ¼Ð¸Ñ‡ÐµÑÐºÐ¸Ð¼ Ñ…Ð²Ð¾ÑÑ‚Ð¸ÐºÐ¾Ð¼ ÑÐ±Ñ€Ð¾ÑÐ¸Ñ‚ ÑÐ¾ ÑÑ‚Ð¾Ð»Ð° Ñ†Ð¸Ð²Ð¸Ð»Ð¸Ð·Ð°Ñ†Ð¸Ð¸ Ð·Ð¾Ð»Ð¾Ñ‚Ð¾Ðµ ÑÐ¸Ñ‡ÐºÐ¾ Ð˜Ð½Ñ‚ÐµÑ€Ð½ÐµÑ‚Ð°.
[07:32:57] <av500> comets?
[07:33:21] <av500> civilization
[07:33:27] <kshishkov> thresh: Ñ, Ð¼ÐµÐ¶Ð´Ñƒ Ð¿Ñ€Ð¾Ñ‡Ð¸Ð¼, ÑƒÑ‡Ð¸Ð»ÑÑ Ð² Ñ‚Ð¾Ð¹ Ð¶Ðµ ÑˆÐºÐ¾Ð»Ðµ, Ñ‡Ñ‚Ð¾ Ð¸ Ð›ÐµÐ¾Ð½Ð¸Ð´ Ð§ÐµÑ€Ð½Ð¾Ð²ÐµÑ†ÐºÐ¸Ð¹ - Ñ„Ð¸Ð³ Ð¼ÐµÐ½Ñ Ñ‚Ð°ÐºÐ¸Ð¼ ÑƒÐ´Ð¸Ð²Ð¸ÑˆÑŒ
[07:34:03] <thresh> av500: yes
[07:34:47] <av500> gahm russian is too unserbian for me in the end...
[07:34:50] <kshishkov> av500: and allusion to Russian folk tale about golden egg with GUI security
[07:34:53] <av500> m->,
[07:35:41] <av500> Leonid who?
[07:36:24] <thresh> Kiev mayor, insane guy
[07:36:45] <kshishkov> http://en.wikipedia.org/wiki/Leonid_Chernovetskyi
[08:21:51] <CIA-99> ffmpeg: mstorsjo * r24244 /trunk/MAINTAINERS: Update maintainers list according to renames made in rev 21284 and 22109
[08:57:09] <CIA-99> ffmpeg: vitor * r24245 /trunk/configure:
[08:57:10] <CIA-99> ffmpeg: Fix obviously missing dependency of float DCT.
[08:57:10] <CIA-99> ffmpeg: Fixes issue 2095.
[11:43:58] <CIA-99> ffmpeg: pross * r24246 /trunk/libavformat/iff.c: remove redundant text and whitespaces from iff demuxer av_log() statements
[11:54:27] <CIA-99> ffmpeg: jai_menon * r24247 /trunk/ffmpeg.c:
[11:54:27] <CIA-99> ffmpeg: FFmpeg : Close input file and free any related memory if
[11:54:27] <CIA-99> ffmpeg: av_find_stream_info fails.
[12:06:44] <CIA-99> ffmpeg: cehoyos * r24248 /trunk/tools/patcheck:
[12:06:44] <CIA-99> ffmpeg: grep Changelog entry from unified diffs
[12:06:44] <CIA-99> ffmpeg: Patch by Rafa?l Carr?, rafael d carre a gmail
[12:13:34] <pross-au> mru/anyone: does av_realloc() ensure 16-byte alignment?
[12:13:39] <mru> yes
[12:13:43] <mru> no
[12:13:45] <mru> av_malloc does
[12:13:47] <mru> not realloc
[12:14:02] <mru> there is easy way to do it with realloc
[12:14:04] <mru> *is no
[12:14:09] <pross-au> Right
[12:14:27] <pross-au> Then our #ifdef MEMALIGN hack for av_malloc is kinda pointless?!
[12:14:48] <mru> no
[12:15:06] <av500> what start alinged stays aligned, no?
[12:15:09] <av500> starts
[12:15:12] <mru> av_malloc guarantees alignment
[12:15:47] <pross-au> okay
[12:16:08] <mru> memalign isn't particularly standard though
[12:16:11] <mru> so we have to check for it
[12:16:33] <pross-au> for the AVFrame get_audio_buffer() stuff i really need av_realloc
[12:16:41] <pross-au> Or some additional creativity...
[12:16:53] <mru> why do you think you need realloc?
[12:18:09] <pross-au> allocing and freeing memory for each frame seems pointless
[12:18:37] <pross-au> that said, we don't need to retain the data, just 'grow' or 'shrink' the buffer
[12:18:47] <pross-au> contents can be trashed
[12:19:19] <av500> is there a point to shrink it?
[12:19:35] <mru> probably not
[12:19:35] <pross-au> av500: flac?
[12:19:47] <pross-au> "please allocate me a 5MiB frame"
[12:19:55] <mru> yes, so?
[12:20:07] <mru> either you have it or you don't
[12:20:14] <pross-au> if all the remaining frames are less then that, it' wasted
[12:20:18] <av500> and if contents are to be trashed: if size > alloc_size ) alloc_size *= 2; free(); malloc( alloc_size ) ....
[12:20:34] <mru> grep for fast_realloc
[12:20:41] <pross-au> ok
[12:21:35] <pross-au> guessing that fast_realloc is always 16-bit aligned either
[12:50:31] <DonDiego> calling out to all you gitards:
[12:50:51] <DonDiego> why can't the be all and end all of vcs systems handle empty directories?
[12:51:03] <Dark_Shikari> it can't?
[12:51:16] * DonDiego waits for the zealots to explain i do not really need that feature..
[12:51:25] <DonDiego> apparently not..
[12:51:40] <DonDiego> neither can mercurial IIRC
[12:51:47] <Dark_Shikari> http://stackoverflow.com/questions/115983/how-do-i-add-an-empty-directory-to-a-git-repository
[12:51:59] <Dark_Shikari> it's a design flaw that could be remedied not-too-difficultly
[12:52:01] <DonDiego> but subversion and bazaar have no trouble
[12:52:03] <Dark_Shikari> But nobody actually cares enough to do it
[12:52:17] <Dark_Shikari> meaning that probably it's not used enough for it to be a huge issue.
[12:52:21] <mru> I find that far less annoying than that svn can't replace a directory with a file in one commit
[12:52:52] <mru> I don't think I've ever deliberatly created an empty directory in any vcs
[12:53:08] <DonDiego> see? see? i just *knew* the trolls would come out of the woodworks in no time..
[12:53:10] <av500> Dark_Shikari: why not add .gitignore
[12:53:19] <av500> oops, DonDiego ^^^^^^
[12:53:26] <lu_zero> iirc tar still have the same issue
[12:53:28] <mru> in fact, I don't think I've ever created a directory without immediatly putting something in it
[12:53:56] <Dark_Shikari> DonDiego: it's not really trolling.  it's more of a matter of "this is a very very minimally useful feature, and while it'd be nice to have, it's not really something huge to bitch about"
[12:54:12] <lu_zero> Dark_Shikari: I'm somehow afraid it isn't considered a flaw
[12:54:21] <av500> mru: I created "~/my_swiss_bank_accounts/" very early, but never anything to put in there..
[12:54:23] <mru> DonDiego: seriously though, why do you need empty dirs?
[12:54:35] <DonDiego> lu_zero: tar handles empty dirs flawlessly of course
[12:54:41] <Dark_Shikari> lu_zero: I suspect a patch would be welcome, it's just nobody cares to do it.
[12:54:58] <kshishkov> why not rewrite git in C++ while you're on it
[12:55:19] <DonDiego> mru: the build system of a lib here at work puts binaries in a subdir and it's easier to add it to svn than to  create it in the makefile
[12:55:19] <av500> DonDiego: wrt trolls, is that the reason not to switch to git?
[12:55:40] <mru> DonDiego: just create it in the makefile
[12:55:40] <av500> ?
[12:55:43] <av500> +1
[12:56:01] <av500> assuming you have the source code of the makefile....
[12:56:09] <DonDiego> or i can do it in svn instead of working in git-svn..
[12:56:13] <mru> av500: makefile.am?
[12:56:28] <av500> mru: dunno, there are wierd commercial things... :)
[12:56:53] <kshishkov> av500: and git can't synchronize with kitchen sink and coffee machine. Let's not switch to it.
[12:57:02] <av500> kshishkov: stop trolling
[12:57:06] <av500> :)
[12:57:12] <mru> putting an empty dir in vcs for that reason is just as stupid as checking in compiled object files
[12:57:37] * av500 checks in compiled object files from 3rd party :)
[12:58:10] <av500> the (3rd) party is always right!
[12:58:46] <kshishkov> av500: there's only one party and it's always the first!
[12:59:02] <mru> birthday party?
[12:59:20] <pJok> will there be cake?
[12:59:29] <av500> pJok: check in the cake
[12:59:36] <mru> the cake is a lie
[12:59:45] <mru> just like the empty dir
[13:00:14] <av500> trope!
[13:00:18] <pJok> mru, i see that you got into the party submission position ;)
[13:02:36] <av500> wrt git, should be easy to assign a special SHA e.g. 0000... to empty file and add that to every tree...
[13:07:55] <jai> ^ defeats the purpose of the cloak doesnt it?
[13:08:07] <elenril> kshishkov: your kitchen sink doesn't run ssh?
[13:09:40] <av500> lol: http://www.seamonkey-project.org/releases/seamonkey2.0.5/changes#spc_os  AIX and OS/2 are not retarded, they are "special"
[13:09:59] <elenril> lol
[13:13:19] <thresh> new zealand wins
[13:13:56] <BBB> peloverde: are you working on the webm support thing?
[13:14:54] <lu_zero> BBB: ?
[13:15:02] <funman> hello
[13:15:47] <kshishkov> hi
[13:16:37] <Dark_Shikari> BBB: asm done yet? =p
[13:17:05] <BBB> fell asleep last night ;) so no
[13:17:09] <BBB> but it'll be done today
[13:17:16] <mru> code while you sleep!
[13:17:33] <kshishkov> get Windows ME in result!
[13:26:31] <kshishkov> warning! may cause brain freeze - http://pastie.org/1045593
[13:27:22] <av500> I reported abuse!
[13:28:08] * kshishkov has more stuff in store to abuse av500
[13:28:27] <av500> lets fight to death using 3rd party uglycode :)
[13:28:34] <funman> kshishkov: that'd be useful to spot radiation-induced memory errors
[13:28:37] <av500> or is that in-house?
[13:28:47] <mru> that's dtshd ref code
[13:28:57] <av500> mru: rapidshare?
[13:29:17] <mru> don't know where kshishkov got it
[13:29:20] <av500> mru: and I was spot on with "3rd party uglycode"
[13:30:27] * av500 throws some "i4_idx" loop variables at kshishkov
[13:30:42] <Dark_Shikari> i4_idx isn't too bad, that makes sense.
[13:30:44] <av500> and of course its WORD32 i4_idx;
[13:30:45] <Dark_Shikari> assuming h264
[13:31:07] <av500> for(i4_idx = 0; i4_idx < n; i4_idx++)
[13:31:22] <Dark_Shikari> where n is 16 obviously =p
[13:31:22] <av500> the int i is all decorated, the "n" is just n ... :)
[13:31:57] <av500> ah, that also exists: WORD32 i4_i;
[13:32:01] <av500> i4i
[13:32:12] <Dark_Shikari> BBB: quick optimization for you to try in vp8
[13:32:17] <av500> and only 4 after lauching both sidewinders...
[13:32:18] <kshishkov> av500: in documentation variable names are like nNumSamplSub-subFr = 8;
[13:32:23] <Dark_Shikari> calculate filter_level immediately after decoding each mb
[13:32:24] <Dark_Shikari> store it
[13:32:30] <Dark_Shikari> and reload it for the per-row deblocking
[13:32:36] <Dark_Shikari> reason: mb structs are still in cache.
[13:32:41] <Dark_Shikari> x264 does this, it helps.
[13:32:55] <Dark_Shikari> Also, this will make it easier to convert the MB structures into a ring buffer.
[13:33:00] <Dark_Shikari> which reduces cache misses.
[13:35:20] <BBB> ok I will try that
[13:35:30] <BBB> (have a lab meet now, so can't work on it right away)
[13:35:37] <Dark_Shikari> k
[13:39:39] <DonDiego> how does vp8 speed compare to libvpx nowadays?
[13:40:11] <Dark_Shikari> on profile==1, we're tied
[13:40:14] <Dark_Shikari> in the last test I did
[13:40:23] <Dark_Shikari> on the regular profile we're still way slower as BBB isn't done with the loopfilter asm yet
[13:42:09] <Honoome> I assume this is x86?
[13:42:39] <Dark_Shikari> yes
[15:42:05] <kierank> roozhou: we found out what that problem with scenarist was if you didn't already see on doom9
[15:42:09] <kierank> it was the users fault
[15:54:46] <peloverde> BBB: not directly
[16:13:02] <roozhou> kierank: that's ok
[17:02:52] <funman> mru: do you often disassemble arm code?
[17:09:33] <mru> funman: define often
[17:09:48] <mru> I don't do it every day
[17:10:13] <funman> once in a while
[17:10:24] <funman> do you use anything beyond IDA and objdump?
[17:10:50] <mru> sometimes gdb
[17:10:59] <mru> and oprofile can do asm listings
[17:11:06] <mru> depends on what I'm doing
[17:11:18] <funman> ok
[17:11:38] <funman> you know EDA ?
[17:12:04] <mru> no
[17:12:55] <funman> an emulator / disassembler which renders graphs in javascript, started by geohot but never finished
[17:20:59] <peloverde> When people say EDA I usually think of Verilog
[17:21:49] <mru> it was rather obvious from the context he referred to something else
[17:21:56] <mru> and I don't know that EDA either
[17:45:42] <kshishkov> peloverde: and in Russian "eda" means "food". Guess what I think of when I hear that word?
[17:45:59] <peloverde> vhdl?
[17:46:06] <_av500_> food?
[17:46:16] <kshishkov> points go to _av500_
[17:46:18] <_av500_> black forest bacon!
[17:46:19] * elenril wants some chocolate
[17:46:25] <kshishkov> food still tries to evade him
[17:46:46] <_av500_> elenril: a chocotrope?
[17:47:00] <kshishkov> _av500_: how it's called in German?
[17:47:18] <_av500_> SchwarzwÃ¤lder Schinken
[17:47:51] <kshishkov> _av500_: does not look like bacon to me. I can look into my fridge again but it's not bacon to me
[17:48:27] <_av500_> ham then
[17:48:45] <mru> ham is too broad a term
[17:48:58] <elenril> _av500_: http://tvtropes.org/pmwiki/pmwiki.php/Main/EverythingsBetterWithChocolate
[17:49:07] <_av500_> u discuss it, i need to go wreck my carport...
[17:49:22] <kshishkov> mru: more meat than fat
[17:49:37] <mru> _av500_: try driving the car at high speed into the supports
[17:49:40] <mru> that should do it
[17:49:52] <mru> or better, use somebody else's car
[17:50:05] <kshishkov> and somebody else to drive too
[17:54:26] <BBB> kshishkov: http://www.listware.net/201007/mplayer-libav-user/51495-libav-user-rtsp-error.html
[17:54:29] <BBB> kshishkov: can you answer that?
[17:56:37] <kshishkov> no, it just some error
[17:56:52] <kshishkov> probably stream is damaged
[17:57:19] <kshishkov> and "MB skip/VOP DQuant" are just old debug messages, ignore them
[17:57:47] <BBB> the bottom part is damage, the rest is debug?
[17:58:05] <kshishkov> exactly
[18:14:21] <jai> lol @ michael
[18:19:33] <Dark_Shikari> lol michael
[18:20:33] <kshishkov> add Peter's name to MARRIAGES file then
[19:13:26] <_av500_> done, very limited partial wrecking
[19:14:42] <mru> find any trolls underneath?
[19:16:00] <_av500_> no, only regenwurms
[19:16:24] <mru> earthworms
[19:21:57] <mru> peloverde: ping
[19:27:02] <lu_zero> yawn
[19:28:07] * lu_zero is sleepy
[19:42:17] <peloverde> pong
[19:43:40] <mru> peloverde: I'd appreciate if you didn't give baptiste ammunition against me
[19:43:43] <mru> he's bad enough as is
[19:44:18] <mru> the comma-separated lists are an ugly hack
[19:44:26] <peloverde> While he might be wrong about the ideal API he seems correct that this change does not break current ABI/API
[19:44:40] <mru> fine, I'll concede that
[19:44:47] <mru> since we already have one such list
[19:44:53] <mru> but that's one too many imo
[19:45:09] <mru> we could add an alias list right now without breaking anything
[19:46:23] <peloverde> Apps using the alias list would require a higher minimum version? Not ideal but I suppose it is acceptable
[19:48:09] <mru> apps wouldn't be required to use it
[19:48:16] <mru> now they are required to split the name
[19:48:23] <mru> or it won't work with av_find_input_format()
[19:48:49] <peloverde> apparently the whole list with all the commas works in av_find_input_format()
[19:49:37] <mru> hmm, apparently
[19:49:47] <mru> but it's still fucking ugly
[19:59:48] <elenril> w00t, new fflames
[20:00:05] <lu_zero> elenril: ames?
[20:00:11] <lu_zero> lames?
[20:00:14] <lu_zero> llamas!
[20:00:17] <Dark_Shikari> mru: you'll enjoy this one
[20:00:18] <Dark_Shikari> 03:54 < derf> Okay, gcc, I don't understand
[20:00:18] <Dark_Shikari> 03:54 < derf>         str     r0, [sp, #4]
[20:00:19] <Dark_Shikari> 03:54 < derf>         ldr     r4, [sp, #4]
[20:00:24] <Dark_Shikari> a register to register move, by way of the stack.
[20:00:29] <Dark_Shikari> On a 3-operand architecture, obviously.
[20:00:38] <lu_zero> why?
[20:00:41] <Dark_Shikari> Because it's gcc.
[20:00:45] <lu_zero> no
[20:00:52] <lu_zero> how's the code around?
[20:01:01] <Dark_Shikari> dunno, but nothing can possibly justify that.
[20:01:25] <Dark_Shikari> unless it's volatile or something obviously.
[20:01:25] <lu_zero> if you tell gcc that all the regs are busy maybe
[20:01:34] <mru> it's possible it needed to store the value to stack for some other reason
[20:01:47] <mru> still shouldn't have done the load
[20:02:08] <_av500_> maybe the part on the stack is declared volatile?
[20:02:26] <mru> it's problem with how gcc works internally
[20:02:43] <mru> it decided that it had to go onto the stack, perhaps even validly
[20:02:49] <mru> now the value lives on the stack
[20:03:24] <mru> the next rtl expression needs the value, and because it lives on the stack, it goes and loads it
[20:03:47] <mru> when it was stored to stack, the old register got marked unused
[20:04:08] <mru> for the load, the register allocator chose another register (randomly)
[20:04:25] <mru> had it chosen the same one, some peephole optimisation might have eliminated the load
[20:04:37] <mru> this is very annoying in gcc
[20:05:30] <lu_zero> they are working on the third iteration of that part of code in few years...
[20:05:31] <neerfri> Hi All, sorry to interupt, small question, in ffmpeg.c, AVInputStream->pts is saved as time in AV_TIME_BASE right ?
[20:05:31] <mru> it can be interesting to dump the rtl between each optimisation pass and watch it morph
[20:05:42] <Dark_Shikari> mru: wow, this is even better
[20:05:43] <Dark_Shikari> 04:05 < derf> What's fun is it then proceeds to use the copy in _both_ r0 and r4.
[20:05:45] <lu_zero> neerfri: should be documented that way
[20:05:59] <mru> hmm, so the register wasn't dead after all
[20:06:08] <elenril> peloverde: does aacdec do downmixing?
[20:06:22] <_av500_> no
[20:06:25] <peloverde> elenril: no that would be retarded
[20:06:33] <neerfri> lu_zero: yup, right now it only says: /* current pts */
[20:06:34] <_av500_> special!
[20:06:42] <peloverde> There is noting format specific about downmixing in an aac encoder
[20:06:46] <mru> Dark_Shikari: the rtl for that could be interesting
[20:06:47] <peloverde> *nothing
[20:06:56] <elenril> i thought it was supposed to be done by decoders
[20:06:58] <elenril> nvm then
[20:07:01] <mru> peloverde: he said dec
[20:07:08] <peloverde> oh sorry
[20:07:11] <peloverde> I'm dumb
[20:07:12] <mru> ac3 and dts can downmix before transform
[20:07:21] * _av500_ loves that
[20:07:35] <peloverde> In hte general case AAC can't
[20:07:39] <mru> if aac doesn't allow that, there's no point in doing any mixing in decoder
[20:07:49] <mru> _av500_: would you like to do it faster?
[20:08:07] <_av500_> which one?
[20:08:21] <mru> let's say dts
[20:08:23] <_av500_> i thought i had all your DTS love?
[20:08:39] <mru> the downmixing can be made faster
[20:08:42] <peloverde> HE-AAC can downmix during SBR synthesis in theory
[20:09:08] <mru> _av500_: you have the bulk of what's possible
[20:09:18] <_av500_> thx :)
[20:09:18] <mru> the qmf filter was by far the most time consuming
[20:09:51] <peloverde> Would people be interested in SBR domain downmixing?
[20:10:17] <Dark_Shikari> mru: do modern ARM chips have store/load forwarding?
[20:10:31] <mru> yes
[20:10:32] <Dark_Shikari> how does it compare to the absurd level that x86 has?
[20:12:53] <peloverde> Have people heard of tooboos? They seem a little shady
[20:12:59] <Dark_Shikari> tooboos?
[20:13:02] <mru> peloverde: he came to you now?
[20:13:27] <peloverde> so I should avoid him?
[20:13:43] <mru> I don't know anything about them
[20:14:17] <mru> I don't have time anyway
[20:14:31] <lu_zero> peloverde: they contacted me as well
[20:14:47] <peloverde> I'm always a little skeptical about russian and chineese companies
[20:16:04] <lu_zero> peloverde: they asked you about some quick reformat from spark?
[20:16:13] <peloverde> yeah
[20:16:23] <lu_zero> (that isn't _that_ quick from what I read)
[20:17:05] <_av500_> mru: they are looking for a flash developer :)
[20:17:34] <_av500_> anda java dev
[20:25:44] <Compn> so is there some recipe for a bikeshed or do the blueprints for such threads get made on the fly ?
[20:25:58] <Compn> short demuxer names been around for a while, good time to shed it up!
[20:26:25] <mru> peloverde: go ahead with the patch if you think it's good
[20:27:49] <BBB> Dark_Shikari: can you re-say all problems with my patch?
[20:27:53] <BBB> Dark_Shikari: my chatlog disappeared
[20:28:33] <BBB> or maybe it didn't
[20:28:35] <BBB> hold on
[20:28:54] <Dark_Shikari> go read the official log then
[20:28:55] <Dark_Shikari> I don't keep logs
[20:30:12] <Dark_Shikari> mru: more fun
[20:30:12] <Dark_Shikari> 04:29 < derf>         mov     r1, r7, asl #1
[20:30:13] <Dark_Shikari> 04:29 < derf>         ldrsh   r5, [fp, r1]
[20:30:13] <Dark_Shikari> 04:30 < derf> (r1 is then subsequently clobbered before being used again)
[20:30:37] <mru> shift could be done in the ldr
[20:30:43] <mru> gcc doesn't seem to like doing that
[20:30:49] <mru> not the first time I've seen this
[20:30:49] <Dark_Shikari> yeah.  it really doesn't.
[20:31:07] <Dark_Shikari> GCC doesn't seem to ever use the free shift unless the shift ends up next to the op in the RTL
[20:31:17] <Dark_Shikari> e.g. it'll optimize x >> 2; x += 5;
[20:31:27] <Dark_Shikari> but it won't optimize x>>=2; y>>=2; x+=5; y+=5;
[20:31:43] <mru> that really sucks
[20:31:54] <Yuvi> 4.4 should be a lot better with using free shifts
[20:32:04] <Dark_Shikari> 04:31 < derf> gcc (Gentoo 4.5.0 p1.1) 4.5.0
[20:32:04] <mru> 4.4 generates 25% slower code
[20:32:11] <mru> 4.5 is back at 4.3 speed
[20:32:19] <Dark_Shikari> mru: odd/even rule
[20:32:26] <Dark_Shikari> the opposite of Star Trek
[20:32:46] <mru> 3.4 was alright
[20:32:53] <Dark_Shikari> well yeah
[20:32:55] <Dark_Shikari> it only applies to 4.
[20:32:56] <mru> so it's odd parity
[20:33:07] <Dark_Shikari> Ah, true
[20:33:10] <Dark_Shikari> so it's the parity of the whole expression.
[20:33:19] <Dark_Shikari> that's why 2.9 was good
[20:33:23] <_av500_> 2.9.5 is even
[20:33:30] <mru> it's 2.95
[20:33:31] <Dark_Shikari> we ignore micro numbers for this purpose.
[20:33:33] <mru> so still odd
[20:33:41] <mru> the micro numbers generally improve things
[20:33:44] <Dark_Shikari> Yeah
[20:33:47] <Dark_Shikari> They're bugfixes mostly
[20:33:50] <mru> .0 is full of bugs
[20:33:56] <mru> .1 compiles most things
[20:34:00] <mru> .2 is almost usable
[20:34:10] <_av500_> lol: #define VIDDEC_ZERO                         0
[20:34:16] <mru> .3 is stable enough to use
[20:34:19] <_av500_> #define VIDDEC_ONE                          1
[20:34:25] <_av500_> #define VIDDEC_MINUS                        -1
[20:34:26] <mru> lol
[20:34:27] <Dark_Shikari> _av500_: corporate policies of no numeric constants
[20:34:39] <mru> Dark_Shikari: love those policies
[20:34:42] <_av500_> bangalore style
[20:34:52] <mru> but they forgot to write (0)
[20:35:08] <mru> because someone told them to always put parens around macros
[20:35:43] <_av500_> better put ((0)), you never know what the pesky preproc will do
[20:36:51] <_av500_> ha:     OMX_U16 arr[100];
[20:36:59] <_av500_> pirate day?
[20:38:08] <Dark_Shikari> wait
[20:38:10] <Dark_Shikari> from the same code?
[20:38:14] <_av500_> yes
[20:38:16] <Dark_Shikari> the same code where they used defines to avoid cnostants?
[20:38:22] <_av500_> yes
[20:39:21] <_av500_> it gets better, they have #define FOO_BAR_MALLOC(..) macros which assume you have an EXIT: label in your code....
[20:39:22] <BBB> if I use m8 on x86-64, should I use cglobal name, args, regs, 9 or cglobal name, args, regs, 1?
[20:39:42] <Dark_Shikari> 9
[20:39:52] <Dark_Shikari> however, it should only use 9 if you're on x86_64.
[20:40:02] <Dark_Shikari> We should really modify x86inc.asm to clip the value to 8 on x86_32.
[20:40:07] <Dark_Shikari> I wonder if it already does.
[20:40:12] <Yuvi> does it do anything with it on 32-bit?
[20:40:17] <Yuvi> I thought it only mattered for win64
[20:40:59] <Dark_Shikari> Oh
[20:41:00] <Dark_Shikari> You're right
[20:41:02] <Dark_Shikari> nevermind, so yeah
[20:45:08] <BBB> so I can just always set it to 9?
[20:45:10] <BBB> that's great :)
[20:48:21] <BBB> Dark_Shikari: the pshufb sse3 version, can I do that in a separate commit?
[20:48:26] <Dark_Shikari> Sure.
[20:48:28] <BBB> it'll need a sse3 version of the simple loopfilter also
[20:48:32] <BBB> and that is all unrelated
[20:48:36] <BBB> hence separate is better maybe
[20:48:39] <Dark_Shikari> That's fine./
[20:48:46] <BBB> ok, so pb_80 is in m8 now on x86-64
[20:48:48] <BBB> (untested)
[20:48:58] <Dark_Shikari> btw, eventually we should do some basic atom optimizations, i.e. not doing pshufb on atom
[20:49:08] <Dark_Shikari> pshufb takes 6 cycles, unpipelined, on atom.
[20:50:17] <BBB> I wonder what the advantage is of using m9-15 instead of stack if I only use it once, sicne (at least on win64), I need to push to the stack anyway (+pop) to use m9-15
[20:50:44] <BBB> I guess "it's only better on linux/mac"?
[20:51:18] <Yuvi> don't you use it at least 6 times?
[20:51:33] <BBB> pb_80, yes
[20:51:35] <BBB> the others, no
[20:51:41] <BBB> I already changed pb_80
[20:51:48] <BBB> I wonder if I should change the others
[20:51:56] <Dark_Shikari> the others, you shouldn't do it
[20:52:00] <Dark_Shikari> only for things you use more than once
[20:52:14] <BBB> I mean "stuff I save to the stack", btw
[20:52:27] <Dark_Shikari> Yes, you should.
[20:52:48] <BBB> ok, let me rewrite some of that stuff then
[20:53:12] <BBB> %ifdef m8 is fine to abuse as a "if sse2 && x86-64"?
[20:54:31] <Dark_Shikari> Yes
[20:54:52] <Dark_Shikari> It's not really abuse, it's more correct
[20:54:56] <Dark_Shikari> because you want to use m8 "if it's available"
[20:55:01] <Dark_Shikari> not "if it's x86_64 and sse2"
[20:55:07] <Dark_Shikari> it just happens that the latter are cases where "it is available"
[20:55:25] <BBB> probably, yeah
[20:56:10] <Yuvi> h264 does edge extension from fpel positions, right?
[20:56:19] <Dark_Shikari> huh?
[20:56:39] <Dark_Shikari> it says "any pixel beyond the edge is defined as equal to the closest valid pixel"
[21:08:57] <Yuvi> but only for actual pixels, and then all subpel is done from that?
[21:12:24] <Dark_Shikari> yes
[21:17:52] <pengvado> peloverde: why can't aac downmix?
[21:18:26] <_av500_> no patch sent?
[21:26:50] <BBB> Dark_Shikari: can you test this for me?
[21:26:53] <BBB> I have no idea if it works
[21:27:08] <BBB> I can put the patch on pastebin or so if you want to test or review or neither
[21:27:40] <BBB> or I can commit and check fate and cry foul if it breaks
[21:27:43] <BBB> and say it's your fault
[21:27:44] <BBB> or so
[21:34:44] <Dark_Shikari> test what
[21:36:09] <BBB> my x86-64 part of the inner loopfilter
[21:36:18] <BBB> it's a little different so it probably needs testing
[21:36:38] * BBB wonders if a coreduo is 64bit
[21:36:41] <BBB> probably not right?
[21:36:59] <Dark_Shikari> no
[21:37:03] <Dark_Shikari> and again
[21:37:04] <Dark_Shikari> test what
[21:37:06] <Dark_Shikari> I don't see a link
[21:37:54] <BBB> http://ffmpeg.pastebin.com/KEn3hn7K
[21:38:27] <pJok> BBB, coreduo lacks 64bit
[21:38:58] <Dark_Shikari> oh, I can't test it.  I don't have enough ram to open my ubuntu vm.
[21:39:06] <Dark_Shikari> someone else here will have to.
[21:39:08] <Dark_Shikari> Or just commit.
[21:39:21] * BBB looks for random x86-64 victim
[21:39:24] <BBB> mru!~
[21:39:40] <lu_zero> BBB: what do you need?
[21:39:55] * lu_zero is still fetching the test traces from cairo...
[21:40:07] <BBB> lu_zero: test the above patch (versus current SVN) to make sure output against the vp8 test suite is identical
[21:40:24] <lu_zero> tell me how step by step
[21:40:30] <lu_zero> I'm sleeping more or less
[21:40:46] <BBB> Run this:
[21:40:46] <BBB> http://ffmpeg.pastebin.com/ZsYJH4xq
[21:41:06] <BBB> samples are vp8-test-vectors-r1 google first result
[21:41:27] <BBB> so download samples somewhere, tell script where the samples are, run ./test.sh
[21:41:32] <BBB> copy test.md5s to ref.md5s
[21:41:34] <BBB> apply my patch
[21:41:37] <BBB> and re-run the script
[21:41:49] <BBB> test.md5s and ref.md5s should be identical on x86-64
[21:43:26] <lu_zero> test.sh ?
[21:44:09] <lu_zero> or run_test.sh?
[21:46:52] <BBB> anything you like
[21:46:58] <BBB> you can call it lu_zero.sh
[21:47:30] <lu_zero> BBB: which script...
[21:47:38] <BBB> http://ffmpeg.pastebin.com/ZsYJH4xq
[21:48:07] <lu_zero> ahhh
[21:48:08] <lu_zero> =_=
[21:48:35] <Dark_Shikari> BBB: suggestion re lines 439-453
[21:48:47] <Dark_Shikari> is there any way to delay the calculation of m1 more
[21:48:51] <Dark_Shikari> to get m7 finished earlier?
[21:48:54] <Dark_Shikari> to limit the dependency chain?
[21:49:14] <Dark_Shikari> Or, equally, to make m7 calculation start earlier
[21:49:51] <BBB> I'm very short of registers there
[21:50:06] <Dark_Shikari> k
[21:50:39] <Dark_Shikari> 385-389 ... might this not be simpler if we just didn't have an mmx version on x86_64?
[21:50:45] <Dark_Shikari> there's no x86_64 cpu that could possibly need it
[21:50:48] <Dark_Shikari> we could dump mmxext there too
[21:51:01] <lu_zero> Results identical
[21:51:06] <lu_zero> apparently
[21:51:11] <Dark_Shikari> \o/
[21:51:12] <BBB> lu_zero: x86-64, 64-bit binary?
[21:51:13] <BBB> thanks :)
[21:51:17] <lu_zero> yup
[21:51:39] <BBB> Dark_Shikari: I'll have to write a few ifdefs in various places to dump the mmx/mmxext versions on x86-64, also in vp8dsp-init.c
[21:52:01] <Dark_Shikari> I guess.  ok
[21:52:02] <BBB> I don't think 385-389 will get easier
[21:52:09] <BBB> because it needs that mova on mmx/x86-32, still
[21:52:22] <BBB> did I mention SWAP is amazing?
[21:52:37] <BBB> whoever invented SWAP should get a nobel prize
[21:52:40] <Dark_Shikari> lol
[21:52:43] <Dark_Shikari> that would be pengvado :)
[21:53:18] <Dark_Shikari> It's even more helpful in DCTs.
[21:53:45] <Dark_Shikari> lines 237-240 ... I'm confused here.  don't we want 3 pixels on each side, not 2?
[21:53:55] <Dark_Shikari> "-2"
[21:54:13] <BBB> 4 left, 3 right
[21:54:17] <BBB> so it's actually -4
[21:54:25] <Dark_Shikari> why is it -2 then
[21:54:34] <BBB> check line 165
[21:54:41] <Dark_Shikari> Why not lea -4 ?
[21:54:45] <Dark_Shikari> instead of -2
[21:54:55] <BBB> that way the movd in the end needs no const offset
[21:55:00] <BBB> to write four pixels per row
[21:55:02] <Dark_Shikari> Just add an extra add
[21:55:13] <Dark_Shikari> every single constant offset wastes one byte of code size
[21:55:37] <Dark_Shikari> i.e. do -4 at the start in the lea
[21:55:41] <Dark_Shikari> and do an "add blah, 2" later
[21:55:48] <Dark_Shikari> Unless that would involve multiple adds due to the multiple pointers
[21:56:05] <BBB> for mmx, I do two loops, 8 rows per loop
[21:56:15] <BBB> so I'd need an add 2, and then a sub 2 later
[21:56:15] <Dark_Shikari> and?
[21:56:18] <BBB> is taht ok?
[21:56:36] <BBB> actually maybe I don't... let me check, I can probably fix that
[21:56:42] <BBB> you're probably right
[21:58:17] <Dark_Shikari> 444-445: maybe swaps?
[21:58:21] <Dark_Shikari> I mean after.
[21:59:57] <Dark_Shikari> 509-512: interleave
[22:00:42] <CIA-99> ffmpeg: mru * r24249 /trunk/libavcodec/arm/h264dsp_neon.S: ARM: NEON H264 chroma loop filter 3 cycles faster
[22:04:09] <Dark_Shikari> BBB: 182-188 should be movu
[22:04:18] <Dark_Shikari> Doesn't change the output, it's just more correct.
[22:06:07] <Dark_Shikari> 281-303: can any of this be more interleaved?
[22:08:47] <BBB> ok, I fixed the -2 thing
[22:08:52] <BBB> it's all without const now
[22:09:00] <BBB> I had to change one add to a lea and add two adds
[22:09:02] <BBB> is that ok?
[22:09:48] <BBB> 444-445 need to stay that way, I need to keep a backup of these, I use them far below
[22:09:52] <BBB> they are q1/p1
[22:10:02] <BBB> which I backup so I can add the filter result to it in the end
[22:12:05] <BBB> movu, you mean movq?
[22:12:13] <BBB> (509-512 interleave done)
[22:13:47] <BBB> 281-303 is again register-sparse, if you have ideas on how to interleave I'd love to hear it... maybe my approach is crazy :)
[22:15:59] <pengvado> I don't think "sparse" is the word you're looking for
[22:17:17] <BBB> I have no registers left
[22:20:24] <BBB> what is movu?
[22:20:30] <Dark_Shikari> move unaligned
[22:20:50] <Dark_Shikari> 444-445: I mean using a swap so that you don't use the output of the mova immediately after
[22:22:21] <Dark_Shikari> like everywhere else.
[22:22:24] <Dark_Shikari> if only for consistency.
[22:22:56] <Dark_Shikari> re aligning the stack -- that doesn't need to be done on 64-bit
[22:23:47] <Dark_Shikari> pengvado: does WIN64 have the red zone?
[22:24:04] <Dark_Shikari> BBB: the "red zone" is 128 bytes below rsp which are valid and can be used.
[22:24:10] <Dark_Shikari> thus you don't have to sub rsp if you're using fewer than 128 byets on x8_64.
[22:24:13] <Dark_Shikari> *x86_64
[22:25:50] <Dark_Shikari> ok, win64 doesn't have the red zone, *nix64 does
[22:25:57] <Dark_Shikari> so you can avoid the sub on *nix64.
[22:29:07] <Dark_Shikari> BBB: flim_E and flim_I are only used once.  you don't need to put them on the stack; just calculate them on the fly when they're needed.
[22:30:10] <Dark_Shikari> oh, you can't do that because you overwrite the gprs
[22:37:58] <BBB> Dark_Shikari: 444-445 aaa I see, is that necessary even though there's another mova in between?
[22:38:05] <BBB> I thought that would negate any potential effect
[22:38:55] <Dark_Shikari> I guess it doesn't matter.
[22:38:58] <BBB> Dark_Shikari: and E/I aren't used once in the mmx/mmxext case
[22:39:00] <BBB> they're used twice
[22:39:14] <Dark_Shikari> and it doesn't matter per what I said
[22:39:15] <Dark_Shikari> re registers
[22:39:22] <Dark_Shikari> now what you should fix is the redzone
[22:39:43] <BBB> first movu
[22:40:05] <BBB> so I should use movu if I read unaligned from memory?
[22:40:26] <BBB> is movu 8-byte, a-size or something else?
[22:40:27] <Dark_Shikari> yes....
[22:40:50] <Dark_Shikari> it's a-size
[22:41:05] <BBB> ok, so these reads are 8-bytes in the mmsize=16 case
[22:41:08] <BBB> what do I use then?
[22:41:35] <Dark_Shikari> movh
[22:42:06] <BBB> ok, got it
[22:42:54] <BBB> I can assume the reads in 174-179 are aligned, right?
[22:42:59] <BBB> (I think so at least)
[22:43:14] <Dark_Shikari> Yes.
[22:43:19] <Dark_Shikari> Otherwise it would crash.
[22:43:23] <BBB> ok, fixed that
[22:43:46] <BBB> now redzone
[22:44:02] <BBB> the stack isn't aligned on 64bit btw
[22:44:06] <BBB> I don't use the stack on 64bit
[22:44:15] <Dark_Shikari> ah ok
[22:44:25] <BBB> that whole block is %ifndef m8
[22:44:30] <Dark_Shikari> also, whenever I think of the red zone, I can't avoid hearing http://www.youtube.com/watch?v=zliubqahaZM
[22:44:42] <Dark_Shikari> BBB: ok, great.
[22:44:48] <Dark_Shikari> I think that covers everything.
[22:44:52] <Dark_Shikari> er....
[22:45:07] <Dark_Shikari> oh nvermind
[22:45:10] <Dark_Shikari> thought I saw something else
[22:45:39] <BBB> time to commit then?
[22:45:45] <BBB> then I can move on to finishing mbedge
[22:45:54] <Dark_Shikari> Looks good.  This is the most important asm function in vp8 imo
[22:46:01] <BBB> this, and mbedge
[22:46:14] <BBB> together they are 60% of my callgraph
[22:46:20] <Dark_Shikari> oh, one thing
[22:46:25] <Dark_Shikari> 412-413
[22:46:30] <Dark_Shikari> shouldn't you only have to make one copy there, not two?
[22:46:37] <Dark_Shikari> e.g.
[22:46:39] <Dark_Shikari> tmp = q0
[22:46:44] <Dark_Shikari> q0 -= p0
[22:46:47] <Dark_Shikari> p0 -= tmp
[22:47:48] <BBB> then I lose p0
[22:47:55] <Dark_Shikari> Unless you need p0/q0 around for later.
[22:47:56] <BBB> I need p0 to add the filter result to later on
[22:47:56] <Dark_Shikari> Oh
[22:47:57] <Dark_Shikari> Ah
[22:47:58] <Dark_Shikari> ok
[22:48:07] <BBB> m2-m5 are the holy grail in the function
[22:48:11] <BBB> "they shall not be touched"
[22:48:17] <BBB> or so
[22:48:29] <Dark_Shikari> you shall not touch
[22:48:32] <Dark_Shikari> </gandalf>
[22:48:40] <BBB> that was pass, not touch :)
[22:49:03] <Dark_Shikari> it could probably use a comment about that
[22:49:05] <Dark_Shikari> I know
[22:49:18] <BBB> I'll add a comment
[22:49:55] <BBB> done
[22:50:09] <BBB> anything else?
[22:50:41] <Dark_Shikari> Looks good to me.
[22:50:42] <BBB> <itchy fngers>
[22:50:47] <BBB> let's commit \o/
[22:51:09] <Dark_Shikari> do a new profile after :)
[22:51:15] <Dark_Shikari> Oh yeah, we need to do chroma
[22:51:17] <Dark_Shikari> mbedge first, or chroma first?
[22:51:22] <Dark_Shikari> I think chroma first, it's less work than a whole new filter
[22:53:15] <neerfri> Dark_Shikari: Hi ! posted a question in #ffmpeg, your asnwer on this can probably save me a few days of work, if you can please have a look
[22:56:16] <BBB> Dark_Shikari: I wanted to do mbedge first because it's more complex ;)
[22:56:23] <bcoudurier> humm
[22:56:30] <Dark_Shikari> lol
[22:56:33] <Dark_Shikari> do the easier ones first
[22:56:37] <bcoudurier> does anybody follow the thesis vs wordpress issue ?
[22:57:26] <Dark_Shikari> BBB: the priority of any asm function is (speed benefit) / (time to write it)
[22:57:38] <BBB> uh... ok
[22:57:48] <BBB> chroma then
[22:58:03] <Dark_Shikari> It makes sense, right?
[22:58:04] <BBB> you wanted me to make it call U/V in a single go with two pointer arguments right?
[22:58:07] <Dark_Shikari> Yes.
[22:58:12] <Dark_Shikari> So all that changes is the transpose.
[22:58:19] <Dark_Shikari> well, basically the Stuff On The Top
[22:58:21] <Dark_Shikari> and Stuff On The Bottom
[22:58:24] <BBB> right
[22:58:27] <Dark_Shikari> the mmx one just loops over the two.
[22:58:31] <Dark_Shikari> the sse one merges them, then unmerges.
[22:58:42] <BBB> got it
[23:03:27] <CIA-99> ffmpeg: rbultje * r24250 /trunk/libavcodec/x86/ (vp8dsp.asm dsputil_mmx.c vp8dsp-init.c dsputil_mmx.h): VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.
[23:03:33] <Dark_Shikari> \o/ \o/ \o/
[23:04:16] <BBB> \o/ indeed
[23:04:18] <BBB> on to chroma
[23:04:23] <BBB> or, actually, on to home
[23:04:26] <BBB> wife is waiting
[23:16:00] <peloverde> janneg: LATM ping?
[23:17:52] <iive> so vp8 now have complete set of asm routines?
[23:17:56] <Dark_Shikari> No.
[23:18:01] <Dark_Shikari> mbedge and chroma and chroma mbedge still need to be done.
[23:18:03] <Dark_Shikari> 3 more.
[23:18:07] <Dark_Shikari> well, 6 counting both directions.
[23:18:30] <iive> oh...
[23:19:32] * iive puts back the champagne.
[23:19:58] <iive> have fun
[23:50:51] <Dark_Shikari> Yuvi: Wait, why can't deblocking be done until after two rows are done?
[23:51:00] <Dark_Shikari> that can't be right.
[23:51:35] <Yuvi> Dark_Shikari: intra pred, I started with the border saving but didn't get it bitexact
[23:51:45] <Dark_Shikari> Oh.  Intra pred.
[23:52:17] <Yuvi> http://pastie.org/1046472 <- current patch
[23:52:28] <Dark_Shikari> I was doing the filter strength hack.
[23:54:05] <Dark_Shikari> shouldn't you kill lines 141-143?
[23:55:28] <Yuvi> yeah
[23:55:48] <Dark_Shikari> what's with the if(tr)?  I thought tr was a separate argument?
[23:56:59] <Yuvi> whether the mb is on the right edge -> should exchange tr too
[23:57:04] <Dark_Shikari> no, I mean in intra pred
[23:57:10] <Dark_Shikari> topright is a separate argument to the itnra pred functions
[23:57:26] <Yuvi> yeah
[23:58:06] <Dark_Shikari> also, are you handling the right-side case correctly?
[23:58:12] <Dark_Shikari> you still have to have something there for that case
[23:58:38] <Yuvi> it's always the same for all 4 blocks on the right, edge extended from the row to the top
[23:58:40] <Yuvi> maybe not