[MPlayer-dev-eng] [PATCH] x86_64 mmx/sse/3dnow optimisation support
Aurelien Jacobs
aurel at gnuage.org
Tue Oct 12 20:23:38 CEST 2004
On Tue, 12 Oct 2004 12:18:42 +0200
Reimar Döffinger <Reimar.Doeffinger at stud.uni-karlsruhe.de> wrote:
> Hi,
>
> > -#ifdef ARCH_X86
> > +#if defined(ARCH_X86) || defined(ARCH_X86_64)
>
> There are lots of these. From what I know, I find it incorrect to
> define X86_64 as a separate architecture, those are "only" 64bit
> extensions after all...
> Think you would need to change less if you put the x86_64 case to the
> x86 case, and just adding something like:
>
> if ("$host_arch" == x86_64); then
> _def_arch += "#define ARCH_X86_64 1\n$_def_arch"
> fi
>
> so that both ARCH_X86 and ARCH_X86_64 are defined then.
Sure I could do this. But I thought It would be clearer to have
distinct ARCH.
I personnaly prefer considering x86_64 as a really different arch
(having 64 bits addressing instead of 32 bits, is a big enough
difference IMHO).
Any other one with an opinion about diffining or not ARCH_X86 for
x86_64 traget ?
> > - unsigned char *u, unsigned char *v, int w, int us, int vs)
> > + unsigned char *u, unsigned char *v, int w, long us, long vs)
> > {
> > asm volatile (""
> > - "pushl %%ebp \n\t"
> > - "movl 4(%%edx), %%ebp \n\t"
> > - "movl (%%edx), %%edx \n\t"
> > + "push %%"REG_BP" \n\t"
> > +#ifdef ARCH_X86_64
> > + "mov %6, %%"REG_BP" \n\t"
> > +#else
> > + "movl 4(%%"REG_d"), %%"REG_BP" \n\t"
> > + "movl (%%"REG_d"), %%"REG_d" \n\t"
> > +#endif
> [...]
> > - : "S" (y), "D" (dst), "a" (u), "b" (v), "d" (&us),
> > "c" (w/16)
> > + : "S" (y), "D" (dst), "a" (u), "b" (v),
> > "c" (w/16),
> > +#ifdef ARCH_X86_64
> > + "d" (us), "r" (vs)
> > +#else
> > + "d" (&us)
> > +#endif
>
> Weird. Why was this done this way? Because your X86_64 version works
> on my AMD Athlon, too... I think this difference shouldn't be there...
I thought it was done that way because all registers were allready used.
In fact esp and ebp was not used but I thought they couldn't be used.
I used anoter way on x86_64 because of the bigger number of GPR.
I will do some more tests on x86 and hopefully remove the old way to
do this.
> > @@ -48,18 +48,18 @@
> > #undef HAVE_MMX
> > #undef HAVE_MMX2
> > #undef HAVE_3DNOW
> > -#undef ARCH_X86
> > +
> > +#ifndef CAN_COMPILE_X86_ASM
> >
> > #ifdef COMPILE_C
> > #undef HAVE_MMX
> > #undef HAVE_MMX2
> > #undef HAVE_3DNOW
> > -#undef ARCH_X86
> > #define RENAME(a) a ## _C
> > #include "osd_template.c"
> > #endif
> >
> > -#ifdef CAN_COMPILE_X86_ASM
> > +#else
> >
> > //X86 noMMX versions
> > #ifdef COMPILE_C
> > @@ -67,7 +67,6 @@
> > #undef HAVE_MMX
> > #undef HAVE_MMX2
> > #undef HAVE_3DNOW
> > -#define ARCH_X86
> > #define RENAME(a) a ## _X86
> > #include "osd_template.c"
> > #endif
> > @@ -78,7 +77,6 @@
> > #define HAVE_MMX
> > #undef HAVE_MMX2
> > #undef HAVE_3DNOW
> > -#define ARCH_X86
> > #define RENAME(a) a ## _MMX
> > #include "osd_template.c"
> > #endif
>
> I don't think this is correct. Without #undef ARCH_X86 you won't get
> the pure C versions of some functions AFAIK... You should add #undef
> ARCH_X86_64 instead.
No. The pure C version will only be built when CAN_COMPILE_X86_ASM is
not defined, which mean that neither ARCH_X86 nor ARCH_X86_64 is
defined at that point.
> Also, have you checked that
> +#ifndef CAN_COMPILE_X86_ASM doesn't break runtime cpudetection?
I tested runtime cpu detection without any problem.
> What are the changes in libvo/osd_template.c for?
In 64 bits mode, x86_64 can't access to upper halves (ah, ch...)
So I modified slightly the algorithm, but as it add 2 shift
instruction, I kept the original algorithm for x86.
> > @@ -2582,7 +2581,8 @@
> > int srcStride1, int srcStride2,
> > int srcStride3, int dstStride)
> > {
> > - unsigned y,x,w,h;
> > + unsigned y,w,h;
> > + unsigned long x;
>
> Huh? I'd say either change them all or modify only
Only x is used for 64 bits addressing in asm code. The other
can still be 32 bits without problem.
Anyway, it's probably cleaner to change them all, so I modified
this.
> > : "+r" (x)
>
> > +#undef REAL_MOVNTQ
> > #undef MOVNTQ
> > #undef PAVGB
> > #undef PREFETCH
> > @@ -54,29 +55,30 @@
> > #endif
> >
> > #ifdef HAVE_MMX2
> > -#define MOVNTQ(a,b) "movntq " #a ", " #b " \n\t"
> > +#define REAL_MOVNTQ(a,b) "movntq " #a ", " #b " \n\t"
> > #else
> > -#define MOVNTQ(a,b) "movq " #a ", " #b " \n\t"
> > +#define REAL_MOVNTQ(a,b) "movq " #a ", " #b " \n\t"
> > #endif
> > +#define MOVNTQ(a,b) REAL_MOVNTQ(a,b)
>
> You added a lot of such REAL_* defines to postproc/swscale_template.c,
> what are they there for? They don't seem to be used anywhere...
This is because I then use something like MOVNTQ(%%REGa,%%REGb).
REGa is also a macro, and it you try to use REAL_MOVNTQ(%%REGa,%%REGb),
the REGa macro won't be substituted. That's why one more indirection
level is required.
I used this because this is the simplest way I found, and because
something similar was already used in MPlayer.
> Reducing the number of changed code line as far as possible and maybe
I know this, and I already tried to reduce it as much as possible.
> also adding comments on the non-obvious part/changes will greatly
> increase the probability of getting this applied.
Unfortunatly, after writting this code, everything seem obvious ;-)
Sorry, but I don't really see what I could comment.
> Btw: I'd expect there was some reason why "addl"/"popl" etc. were used
> in the original code and not "add"/"pop" etc. Can somebody explain?
They were there to explicitly show that this was a 32 bits operation,
But 'as' is smart enough to deduce the operation size from the type
of operand (eax => 32 bits, rax => 64 bits).
Moreover this modification (addl => add) was suggested by Michael on
ffmpeg-devel, and he accepted my patch for ffmpeg, so I can't see how
it could be wrong ;-)
Here is an updated version of my patch. It also fix a small bug
there was in my first version (an int to long cast was missing in
swscale causing a segfault).
Aurel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MPlayer-x86_64.diff.bz2
Type: application/octet-stream
Size: 18304 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-dev-eng/attachments/20041012/563db092/attachment.obj>
More information about the MPlayer-dev-eng
mailing list