[FFmpeg-devel] [PATCH] h264_i386: Optimize decode_significance_8x8_x86 for 64 bit.
Reimar Döffinger
Reimar.Doeffinger at gmx.de
Wed Dec 3 22:39:00 CET 2014
On Wed, Dec 03, 2014 at 01:19:48PM +0100, Michael Niedermayer wrote:
> On Wed, Dec 03, 2014 at 09:00:39AM +0100, Reimar Döffinger wrote:
> > On 03.12.2014, at 01:40, Michael Niedermayer <michaelni at gmx.at> wrote:
> > > On Sat, Nov 22, 2014 at 02:09:01PM +0100, Reimar Döffinger wrote:
> > >> On Mon, Nov 17, 2014 at 01:41:13PM +0100, Michael Niedermayer wrote:
> > >>> On Mon, Nov 17, 2014 at 08:19:32AM +0100, Reimar Döffinger wrote:
> > >>>> On 17.11.2014, at 02:37, Michael Niedermayer <michaelni at gmx.at> wrote:
> > >>>>> On Sat, Nov 15, 2014 at 06:16:03PM +0100, Reimar Döffinger wrote:
> > >>>>>> 11674 -> 10877 decicycles on my Phenom II.
> > >>>>>> Overall speedup was unfortunately within measurement error.
> > >>>>>
> > >>>>> here its 10153 ->10135
> > >>>>
> > >>>> I suspect it also depends a bit on the compiler and how it changes the surrounding code.
> > >>>> Note that I also tested with PIC actually.
> > >>>>
> > >>>>> but ive a slightly odd feeling about the chnages to the asm code,
> > >>>>> iam not sure if all assemblers will be happy about the changed
> > >>>>> code
> > >>>>
> > >>>> Do you mean particularly the movzbl change?
> > >>>
> > >>> yes and the k stuff
> > >>>
> > >>>
> > >>>> I am also unsure about that, I think there was a reason for that %k6 mess...
> > >>>> But this as well as movzx seemed to work for me...
> > >>>
> > >>> it works here too i just have the feeling it might fail on some odd
> > >>> assembler or platform. Thats not meant to keep you from pushing this
> > >>> just that it might require to be reverted or fixed if such
> > >>> problems actually occor
> > >>
> > >> I pushed it.
> > >> If anyone sees issues please tell me and I'll look into it!
> > >
> > > i think these fate failures are caused by it but thats based just
> > > on other commits in the range looking unlikely:
> > >
> > > http://fate.ffmpeg.org/report.cgi?time=20141122231657&slot=x86_64-darwin-clang-3.5-O3
> > > http://fate.ffmpeg.org/report.cgi?time=20141122223720&slot=x86_64-darwin-clang-3.5
> >
> > That's annoying, I only expected compile errors, this looks more like a compiler bug.
> > Can someone run tests?
> > Does just using the "m" instead of "r" constraint like on 32 bit fix it?
>
> still aborts with:
Oh dear.
On re-reading the code it seems I got a bit confused on what %0 actually
points to (I somehow thought it actually pointed to the on-stack x86_reg).
I can't test and benchmark today, but I think this one might fix it:
--- a/libavcodec/x86/h264_i386.h
+++ b/libavcodec/x86/h264_i386.h
@@ -178,7 +178,7 @@ static int decode_significance_8x8_x86(CABACContext *c,
"mov %2, %0 \n\t"
"mov %1, %6 \n\t"
- "mov %6, (%0) \n\t"
+ "mov %k6, (%0) \n\t"
"test $1, %4 \n\t"
" jnz 5f \n\t"
@@ -191,7 +191,7 @@ static int decode_significance_8x8_x86(CABACContext *c,
"cmp $63, %6 \n\t"
" jb 3b \n\t"
"mov %2, %0 \n\t"
- "mov %6, (%0) \n\t"
+ "mov %k6, (%0) \n\t"
"5: \n\t"
"addl %8, %k0 \n\t"
"shr $2, %k0 \n\t"
More information about the ffmpeg-devel
mailing list