[FFmpeg-devel] [PATCH] Add x86-optimized versions of lshift_tab().
Ronald S. Bultje
rsbultje
Sat Feb 12 21:20:25 CET 2011
Hi,
On Sat, Feb 12, 2011 at 2:31 PM, Justin Ruggles
<justin.ruggles at gmail.com> wrote:
> New function name AC3DSPContext.ac3_lshift_int16().
> ---
> ?libavcodec/ac3dsp.c ? ? ? ? | ? 11 +++++++++++
> ?libavcodec/ac3dsp.h ? ? ? ? | ? 11 +++++++++++
> ?libavcodec/ac3enc_fixed.c ? | ? 19 +------------------
> ?libavcodec/x86/ac3dsp.asm ? | ? 35 +++++++++++++++++++++++++++++++++++
> ?libavcodec/x86/ac3dsp_mmx.c | ? ?7 +++++++
> ?5 files changed, 65 insertions(+), 18 deletions(-)
[..]
> + /**
> + * Left-shift each value in an array of int16_t by a specified amount.
> + * @param src input array
> + * constraints: align 16
> + * @param len number of values in the array
> + * constraints: multiple of 32 greater than 0
> + * @param shift left shift amount
> + * constraints: range [0,15]
> + */
> + void (*ac3_lshift_int16)(int16_t *src, int len, unsigned int shift);
See below on this.
> +cglobal ac3_lshift_int16_%1, 3,3,5, src, offset, shift
> + cmp shiftd, 0
> + je .end
test shiftd, shiftd should give smaller binary code, and then "jz"
(although that's actually the same, but jz better describes what it
does here).
> + shl offsetq, 1
> + sub offsetq, mmsize*4
> + movd m0, shiftd
> +.loop:
> + mova m1, [srcq+offsetq ]
> + mova m2, [srcq+offsetq+mmsize ]
> + mova m3, [srcq+offsetq+mmsize*2]
> + mova m4, [srcq+offsetq+mmsize*3]
> + psllw m1, m0
> + psllw m2, m0
> + psllw m3, m0
> + psllw m4, m0
> + mova [srcq+offsetq ], m1
> + mova [srcq+offsetq+mmsize ], m2
> + mova [srcq+offsetq+mmsize*2], m3
> + mova [srcq+offsetq+mmsize*3], m4
> + sub offsetq, mmsize*4
> + jge .loop
> +.end:
> + RET
> +%endmacro
> +
> +INIT_MMX
> +AC3_LSHIFT_INT16 mmx
> +INIT_XMM
> +AC3_LSHIFT_INT16 sse2
Doesn't this do 64 per loop iteration for sse2? If so, doesn't that
conflict with the function definition and/or overflow?
Ronald
More information about the ffmpeg-devel
mailing list