[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm
Alexander Strange
astrange
Fri Sep 24 21:30:13 CEST 2010
On Sep 24, 2010, at 3:20 PM, Ronald S. Bultje wrote:
> Hi,
>
> On Fri, Sep 24, 2010 at 12:26 PM, Daniel Verkamp <daniel at drv.nu> wrote:
>> On Fri, Sep 24, 2010 at 9:04 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>> So removing pand (which doesn't do anything in the one case, and can
>>> be replaced by a pxor in the other). With the attached patch #2, I get
>>> this:
>>> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:315:bad
>>> register name `%%mm0'
>>> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:520:bad
>>> register name `%%mm0'
>>>
>>> What does that mean?
>>
>> If you omit all of the optional colon-separated arguments to asm, the
>> % symbols before register names in the asm no longer need to be
>> escaped with a second % (I suppose since there can be no substitution
>> when there are no operand constraints). You can add an empty : or
>> just drop the doubled % to avoid this.
>
> OK, that fixes it. Oddly, it's the same speed, even though
> #instructions is less. OK, so next then. Attached patch is supposed to
> be part of a patch that decreases the insane amount of registers used
> for temporary stuff that could be loaded directly (so instead of doing
> (%0) where %0="m"(var[idx1]), use (%0,%1) with %0="r"(var) and
> %1="r"(idx1). This works and is not slower (eventually it will be
> faster when it saves a few registers, this is work-in-progress.
>
> The second patch ("test") tries to use d_idx as a global (which it is,
> in effect). Why doesn't this work?
>
> - "por (%0,%1), %%mm1 \n" // nnz[b] || nnz[bn]
> + "por %1(%0), %%mm1 \n" // nnz[b] || nnz[bn]
> ::"r"(nnz+b_idx),
> - "r"(d_idx)
> + "g"(d_idx)
"g" permits registers, so it could generate something like this:
por %rax(%rcx), %mm1
You have to use "m" or MANGLE() in this situation.
Also, generating _d_idx(%rax) won't work on BROKEN_RELOCATIONS (x86-64 darwin) because all global references must be _d_idx(%rip).
If you have a recent clang try compiling with that, it has a built-in assembler which may have clearer errors.
?and which doesn't build ffmpeg because it doesn't know about 3dnow.
More information about the ffmpeg-devel
mailing list