[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Fri Sep 24 21:30:13 CEST 2010

On Sep 24, 2010, at 3:20 PM, Ronald S. Bultje wrote:

> Hi,
> 
> On Fri, Sep 24, 2010 at 12:26 PM, Daniel Verkamp <daniel at drv.nu> wrote:
>> On Fri, Sep 24, 2010 at 9:04 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>> So removing pand (which doesn't do anything in the one case, and can
>>> be replaced by a pxor in the other). With the attached patch #2, I get
>>> this:
>>> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:315:bad
>>> register name `%%mm0'
>>> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:520:bad
>>> register name `%%mm0'
>>> 
>>> What does that mean?
>> 
>> If you omit all of the optional colon-separated arguments to asm, the
>> % symbols before register names in the asm no longer need to be
>> escaped with a second % (I suppose since there can be no substitution
>> when there are no operand constraints).  You can add an empty : or
>> just drop the doubled % to avoid this.
> 
> OK, that fixes it. Oddly, it's the same speed, even though
> #instructions is less. OK, so next then. Attached patch is supposed to
> be part of a patch that decreases the insane amount of registers used
> for temporary stuff that could be loaded directly (so instead of doing
> (%0) where %0="m"(var[idx1]), use (%0,%1) with %0="r"(var) and
> %1="r"(idx1). This works and is not slower (eventually it will be
> faster when it saves a few registers, this is work-in-progress.
> 
> The second patch ("test") tries to use d_idx as a global (which it is,
> in effect). Why doesn't this work?
> 
> -                "por  (%0,%1), %%mm1 \n" // nnz[b] || nnz[bn]
> +                "por  %1(%0), %%mm1 \n" // nnz[b] || nnz[bn]
>                 ::"r"(nnz+b_idx),
> -                  "r"(d_idx)
> +                  "g"(d_idx)

"g" permits registers, so it could generate something like this:
   por %rax(%rcx), %mm1

You have to use "m" or MANGLE() in this situation.
Also, generating _d_idx(%rax) won't work on BROKEN_RELOCATIONS (x86-64 darwin) because all global references must be _d_idx(%rip).

If you have a recent clang try compiling with that, it has a built-in assembler which may have clearer errors.
?and which doesn't build ffmpeg because it doesn't know about 3dnow.