[FFmpeg-devel] [FFmpeg-cvslog] x86/me_cmp: port mmxext and sse2 sad functions to yasm
James Almer
jamrial at gmail.com
Wed Sep 17 18:14:33 CEST 2014
On 17/09/14 9:07 AM, Michael Niedermayer wrote:
> On Wed, Sep 17, 2014 at 01:18:12PM +0200, Clément Bœsch wrote:
>> On Wed, Sep 17, 2014 at 11:41:32AM +0200, James Almer wrote:
>>> ffmpeg | branch: master | James Almer <jamrial at gmail.com> | Tue Sep 16 21:41:47 2014 -0300| [0456d169c469a79e305813d14c873fe698c8c572] | committer: Michael Niedermayer
>>>
>>> x86/me_cmp: port mmxext and sse2 sad functions to yasm
>>>
>>> Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
>>> sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext).
>>> Since the _xy2 versions are not bitexact, they are accordingly marked as
>>> approximate.
>>>
>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>> Signed-off-by: Michael Niedermayer <michaelni at gmx.at>
>>>
>>>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=0456d169c469a79e305813d14c873fe698c8c572
>>> ---
>>>
>>> libavcodec/x86/me_cmp.asm | 330 ++++++++++++++++++++++++++++++++++++++++++
>>> libavcodec/x86/me_cmp_init.c | 203 +++++++-------------------
>>> 2 files changed, 379 insertions(+), 154 deletions(-)
>>>
>>> diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
>>> index b0741f3..27176f4 100644
>>> --- a/libavcodec/x86/me_cmp.asm
>>> +++ b/libavcodec/x86/me_cmp.asm
>>> @@ -23,6 +23,10 @@
>>>
>>> %include "libavutil/x86/x86util.asm"
>>>
>>> +SECTION_RODATA
>>> +
>>> +cextern pb_1
>>> +
>>> SECTION .text
>>>
>>> %macro DIFF_PIXELS_1 4
>>> @@ -465,3 +469,329 @@ cglobal hf_noise%1, 3,3,0, pix1, lsize, h
>>> INIT_MMX mmx
>>> HF_NOISE 8
>>> HF_NOISE 16
>>> +
>>> +;---------------------------------------------------------------------------------------
>>> +;int ff_sad_<opt>(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2, int stride, int h);
>>> +;---------------------------------------------------------------------------------------
>>> +INIT_MMX mmxext
>>> +cglobal sad8, 4, 4, 0, v, pix1, pix2, stride
>>> + movu m2, [pix2q]
>>> + movu m1, [pix2q+strideq]
>>> + psadbw m2, [pix1q]
>>> + psadbw m1, [pix1q+strideq]
>>> + paddw m2, m1
>>> +
>>> +%rep 3
>>> + lea pix1q, [pix1q+strideq*2]
>>> + lea pix2q, [pix2q+strideq*2]
>>> + movu m0, [pix2q]
>>> + movu m1, [pix2q+strideq]
>>> + psadbw m0, [pix1q]
>>> + psadbw m1, [pix1q+strideq]
>>> + paddw m2, m0
>>> + paddw m2, m1
>>> +%endrep
>>> + movd eax, m2
>>> + RET
>>> +
>>
>> Sorry to notice that now but... what happened to the h parameter?
>
> i had missed that when reviewing
>
> fixed
It's not needed. I purposely removed it and made it a fixed %rep since it's supposedly
guaranteed to be 8.
Check the inline version it replaced.
More information about the ffmpeg-devel
mailing list