[FFmpeg-devel] [PATCH 2/2] x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}

Christophe Gisquet christophe.gisquet at gmail.com
Tue May 27 10:09:39 CEST 2014


Hi,

2014-05-27 9:27 GMT+02:00 James Almer <jamrial at gmail.com>:
> -    mov          r2, r1
> -    neg          r2
> -    shl          r2, 4
> -    sub          r0, r2
[...]
> -    movd        eax, m6
> -    and         eax, 0xffff
> +    paddw        m4, m3
> +%if mmsize == 8
> +    add          r0, r1
> +%else
> +    lea          r0, [r0+r1*2]
> +%endif
> +    dec r2
> +    jne .loop

Sorry for not doing my work of sufficiently studying these changes, but...
Are you removing some corner cases handling? The top part is maybe a
better loop control in your code, but the second part you remove looks
like some overflow prevention.


>  INIT_MMX mmx
> +PIX_SUM16 0, 16
> +INIT_XMM sse2
> +PIX_SUM16 6, 8
> +
[...]
> +INIT_MMX mmx
> +PIX_NORM1 0, 16
> +INIT_XMM sse2
> +PIX_NORM1 6, 8

PHADDW may map to an ssse3 instruction. I suspect it's probably not
worth the 3% cycle count decrease, but do you have plans about it? (eg
in a later patch)

> --- a/libavutil/x86/x86util.asm
> +++ b/libavutil/x86/x86util.asm
> @@ -288,7 +288,12 @@
>      paddd   %1, %2
>  %endif
>  %if notcpuflag(xop) || sizeof%1 != 16
> +%if cpuflag(mmxext)
>      PSHUFLW %2, %1, q0032
> +%else ; mmx
> +    mova    %2, %1
> +    psrlq   %2, 32
> +%endif
>      paddd   %1, %2
>  %endif
>  %undef %1

If I'm not mistaken, x264 folks don't care about mmx, so no need to
wait for their opinion on this change. This part OK then.

Not tested but looks good to me otherwise.

Best regards,
-- 
Christophe


More information about the ffmpeg-devel mailing list