[FFmpeg-devel] [PATCH] Add x86-optimized function ac3_or_abs_int16() and use in log2_tab().
    Ronald S. Bultje 
    rsbultje
       
    Sat Feb 12 22:24:07 CET 2011
    
    
  
Hi,
On Fri, Feb 11, 2011 at 7:55 PM, Justin Ruggles
<justin.ruggles at gmail.com> wrote:
> ?libavcodec/ac3dsp.c ? ? ? ? | ? ?9 ++++++
> ?libavcodec/ac3dsp.h ? ? ? ? | ? 11 ++++++++
> ?libavcodec/ac3enc_fixed.c ? | ? 11 ++-----
> ?libavcodec/x86/ac3dsp.asm ? | ? 61 +++++++++++++++++++++++++++++++++++++++++++
> ?libavcodec/x86/ac3dsp_mmx.c | ? 11 ++++++++
> ?5 files changed, 95 insertions(+), 8 deletions(-)
[..]
> +    mova [rsp], m4
> +    xor    rax, rax
> +    or      ax, [rsp]
> +    or      ax, [rsp+2]
> +    or      ax, [rsp+4]
> +    or      ax, [rsp+6]
> +%ifidn mmsize, 16
> +    or      ax, [rsp+8]
> +    or      ax, [rsp+10]
> +    or      ax, [rsp+12]
> +    or      ax, [rsp+14]
> +%endif
for xmm version:
mova       xmm5, xmm4
punpckhqdq xmm4, xmm4
por        xmm5, xmm4      ; or in lowest 8 bytes
pshuflw    xmm4, xmm5, 0xe
por        xmm5, xmm4      ; or in lowest 4 bytes
pshuflw    xmm4, xmm5, 0x1
por        xmm5, xmm4
movd        eax, xmm5
For mmx version:
pshuflw     mm5, mm4, 0xe
por         mm4, mm5
pshuflw     mm5, mm4, 0x1
por         mm4, mm5
movd        eax, mm4
This has the advantage that you don't need to mess around with rsp
anywhere. I can't predict for sure if it's faster, but it probably is.
Ronald
    
    
More information about the ffmpeg-devel
mailing list