[Ffmpeg-devel] Broken trunk on AMD64 with PIC enabled

Wed Apr 4 11:48:09 CEST 2007

On Wed, 4 Apr 2007, Michael Niedermayer wrote:
> On Tue, Apr 03, 2007 at 02:58:33PM -0700, Trent Piepho wrote:
> > There is
> > nothing that can be used for "ff_h264_lps_range(%eax, %esi, 2)" that is PIC
> > for ia32 or x86-64.
>
> it can be done with relocations, that is in theory, i dont know if the
> compiler/linker/loader can do it and i dont have the time to play with this
> ATM, working and clean patches welcome, complaints not

Relocations can make it work as a shared object on ia32, but it's not PIC.
The only way I can think of it make it PIC (without using different asm
code that clobbers another register) would be to somehow stick
ff_h264_lps_range in the text section, and use a CS segment override.
That's probably a bad idea since segment overrides are expensive outside of
real mode, and I'm not sure if it even works on x86-64.

Back in the 16-bit days, you sometimes would do that so you could avoid
changing segment registers inside a loop and reference three segments
instead of two.  Stick your constant lookup table in CS, use DS for the
input address, and ES for the output address.

> > I've got to wonder, what the point of MANGLE is?
>
> its point is to workaround some gcc bugs with too many operands

I'm not so sure about that, but I do see a point to MANGLE.  There doesn't
seem to be any way to get gcc in pic mode to use TEXTRELs in inline asm
except to reference the symbol directly from the asm.  The "e" or "i"
constraints must be static link-time constants, not dynamic link-time
constants (aka text relocations).

> > asm("movzbl %c2(%1, %0, 2), %0"
> >     : "+r"(range)
> >     : "r"(ret), "e"(ff_h264_lps_range));
> >
> > The 'e' constraint selects a signed 32-bit constant or symbolic reference,
> > which is exactly what is allowed (for both ia32 and x86-64) for the
> > displacement field in the address.
>
> this will likely  break with old gcc (hint MANGLE was often added becasue gcc
> failed with any other way to access the variables), also it doesnt fix the
> problem ...

For older gcc "i" should be used instead of "e".  It only makes a
difference on x86-64, for ia-32 "i" and "e" are the same.  I don't have
access to gcc 2.95 or whatever the minimum supported version is right now,
but I'm pretty sure this works.  Of course this doesn't solve the problem
of generating a DSO on x86-64.

> > gcc will correctly tell you this asm can't be compiled in PIC mode.  If
> > you want PIC, ff_h264_lps_range is not a constant, so you must put it
> > in a register or not use any addressing modes.
>
> you could use self modifying code or let the loader fix the address up if
> it supports that which as i said i dont have time ATM to find out

Won't work since the displacement field is 32-bits but the address is
64-bits.

> > asm("movzbl ff_h264_lps_range(%1, %0, 2), %0"
> >     : "+r"(range)
> >     : "r"(ret));

There is simply no way to make this instruction work in a DSO on x86-64.
You must generate some kind of alternate code.

Such as:
A.1) Have gcc stick ff_h264_lps_range in a register, which is clobbered:

    long clobber;
    asm("lea (%[lps],%q[range],2), %[lps]	\n\t"
        "movzbl (%[lps],%[ret],1), %[range]	"
	: [range]"+r"(range), "=&r"(clobber)
	: [ret]"r"(ret), [lps]"1"(&ff_h264_lps_range));

    gcc will do what is necessary to load &ff_h264_lps_range into a
    register.  For example, non-PIC:
    mov $ff_h264_lps_range, %rax	# rax = &ff_h264_lps_range
    x86-64 PIC:
    lea ff_h264_lps_range(%rip), %rax	# rax = &ff_h264_lps_range
    ia-32 PIC:
    mov ff_h264_lps_range at GOT(%ebx), %eax # eax = &ff_h264_lps_range

A.2)  Same thing, but use a scratch register to not clobber %[lps],
      if you want to use ff_h264_lps_range multiple times.  Of course this
      needs two extra registers.

    long scratch;
    asm("lea (%[lps],%q[range],2), %[scratch]	\n\t"
        "movzbl (%[scratch],%[ret],1), %[range]	"
	: [range]"+r"(range), [scratch]"=&r"(scratch)
	: [ret]"r"(ret), [lps]"r"(&ff_h264_lps_range));

A.3)  Pretty much the same, but put the address in a register yourself.

    long scratch;
    asm("lea %a[lps], %[scratch]			\n\t"
        "add %q[ret], %[scratch]			\n\t"
	"movzbl (%[scratch],%[range],2), %[range]	"
	: [range]"+r"(range), [scratch]"=&r"(scratch)
	: [ret]"Zmr"(ret), [lps]"p"(&ff_h264_range));

    Note that I made this code a little different, since ret is now a
    source for an add instruction, it can be an "m" or "Z" (Z = 32-bit
    constant) and might not need to be in a register.

B) Have gcc calculate the address for you (if it doesn't change)

   asm("movzbl %[lps], %[range]"
       : [range]"=r,m"(range)
       : [lps]"rm,r"(*((char*)ff_h264_lps_range + range*2 + ret)));

   Note that in non-PIC mode, gcc will generate the optimal code:
   movzbl ff_h264_lps_range(%edx,%eax,2), %eax

   And in PIC mode, you might get something like:
   movzbl (%rdx,%rax), %eax
   or even:
   movzbl %al, 8(%ebp)

   Where gcc has done something like "ff_h264_lps_range(%rip), %rdx" to
   create a valid address for the source of the movzbl.

None of the A options generate optimal code for the non-PIC case.  They all
need an extra register too.

Option B should generate the "optimal" code for both PIC and non-PIC, but
you can't change range or ret inside the asm block.