[FFmpeg-devel] [WIP] [PATCH 4/4] x86: dsputilenc: convert hf_noise*_mmx to yasm
Timothy Gu
timothygu99 at gmail.com
Mon Jun 2 05:33:16 CEST 2014
On Jun 1, 2014 6:36 PM, "Michael Niedermayer" <michaelni at gmx.at> wrote:
> > +%if %1 == 16
> > + push pix1q
> > + push hq
> > +%endif
>
> dont use push/pop they can messup the yasm magic macros
> you can use PUSH/POP but better dont use them either, there should be
> enough registers
With the other local variable, there will be 6 registers used, which is IMO
a lot for a function like this. Is there any significant performance
penalty using PUSH/POP vs. local variable?
> > + sub hd, 2
> > + pxor m7, m7
> > + pxor m6, m6
> > + HF_NOISE_PART1 %1, 0, 1, 2, 3
> > + add pix1q, lsizeq
> > + HF_NOISE_PART1 %1, 4, 1, 5, 3
> > + HF_NOISE_PART2 0, 2
> > + add pix1q, lsizeq
> > +.loop:
> > + HF_NOISE_PART1 %1, 0, 1, 2, 3
> > + HF_NOISE_PART2 4, 5
> > + add pix1q, lsizeq
> > + HF_NOISE_PART1 %1, 4, 1, 5, 3
> > + HF_NOISE_PART2 0, 2
> > + add pix1q, lsizeq
> > + sub hd, 2
> > + jne .loop
> > +
> > + mova m0, m6
> > + punpcklwd m0, m7
> > + punpckhwd m6, m7
> > + paddd m6, m0
> > + mova m0, m6
> > + psrlq m6, 32
> > + paddd m0, m6
> > +%if %1 == 16
>
> > + movd ebx, m0 ; ebx = result of hf_noise16;
>
> you cant just write into a random register
> declare a local variable in the cglobal macro above and use it instead
OK. But how about the return value at the end? Is eax specifically designed
to be clobbered?
>
>
>
> > + pop hq ; restore h and pix1
> > + pop pix1q
> > + ; lsize is unchanged (except movsxd, which hf_noise8 is going to
do anyway)
> > + add pix1q, 8 ; pix1 = pix1 + 8;
>
> > + call hf_noise8 ; eax = hf_noise8_mmx(pix1, lsize, h);
>
> dont call cglobal functions, if you do you would have to emulate the
> calling conventions of all ABIs, x86_32 would pass arguments over the
> stack for example
Should I then plug in a version of `HF_NOISE 8` without cglobal and stuff?
>
> also looking at the disassembly of the function with gdb and the
> register values when it crashes (if it does) or single steping through
> the code wth gdb should help you understand whats the problem or
> difference between what you want and what the computer actually does
I spent 2 nights trying to debug this, without any luck. It seems like
[pix1q+2*lsizeq] is unallocated, as it crashes in the first instruction in
the loop, the first time it is executed. I can't understand how this would
happen.
I also compared the disassembly of the new code with the old inline one
line by line, but I can't find anything either.
[...]
Timothy
More information about the ffmpeg-devel
mailing list