[FFmpeg-devel] [PATCH 02/10] diracdsp: add dequantization SIMD
James Almer
jamrial at gmail.com
Mon Jun 27 23:38:06 CEST 2016
On 6/27/2016 8:53 AM, Rostislav Pehlivanov wrote:
> I've attached another patch which should work fine now.
> I did this after the put_signed_rect so it does require the first patch,
> but if this patch is okay I'll amend and tidy things before I push.
> For some reason changing dstq to be stored at r4 or r3 broke it and I've no
> idea why. Neither is used after loading m2 and m3. Should work on x86_32
> now, but I'm wondering why I can't save that register.
[...]
> diff --git a/libavcodec/x86/diracdsp.asm b/libavcodec/x86/diracdsp.asm
> index c5cc530..4bc8b2d 100644
> --- a/libavcodec/x86/diracdsp.asm
> +++ b/libavcodec/x86/diracdsp.asm
> @@ -266,9 +266,45 @@ HPEL_FILTER sse2
> ADD_OBMC 32, sse2
> ADD_OBMC 16, sse2
>
> -%if ARCH_X86_64 == 1
> INIT_XMM sse4
>
> +; void dequant_subband_32(uint8_t *src, uint8_t *dst, ptrdiff_t stride, const int qf, const int qs, int tot_v, int tot_h)
> +cglobal dequant_subband_32, 7, 8, 4, src, dst, stride, qf, qs, tot_v, tot_h
x86_32 has 8 gprs but you can only use 7 as the last one is reserved
to keep the stack pointer.
> +
> + movd m2, qfd
> + movd m3, qsd
> + SPLATD m2
> + SPLATD m3
> + mov r4, tot_hq
> + mov r7, dstq
> +
> + .loop_v:
> + mov tot_hq, r4
> + mov dstq, r7
> +
> + .loop_h:
> + movu m0, [srcq]
> +
> + pabsd m1, m0
> + pmulld m1, m2
> + paddd m1, m3
> + psrld m1, 2
> + psignd m1, m0
> +
> + movu [dstq], m1
> +
> + add srcq, mmsize
> + add dstq, mmsize
> + sub tot_hd, 4
> + jg .loop_h
> +
> + add r7, strideq
> + dec tot_vd
> + jg .loop_v
> +
> + RET
I'm not sure why you say using r3 instead of r7 here didn't work for
you. I just tried it (after applying all patches up to 6/10) and fate
at least still passes, on both x86_64 and x86_32.
More information about the ffmpeg-devel
mailing list