[FFmpeg-devel] [PATCH] avcodec/h264: mmxext 4:2:2 chroma deblock/loop filter
James Almer
jamrial at gmail.com
Fri Jan 15 03:55:44 CET 2016
On 1/14/2016 11:05 PM, James Darnley wrote:
> 2.6 times faster
> ---
> I have one question now. Should I make the function name match the assembly
> existing deblock/loop filter functions? I took the current name from the C (as
> I was originally trying to use a gather instruction but that didn't offer any
> benefit).
> ---
> libavcodec/x86/h264_deblock.asm | 40 ++++++++++++++++++++++++++++++++++++++++
> libavcodec/x86/h264dsp_init.c | 4 ++++
> 2 files changed, 44 insertions(+)
>
> diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
> index 5151f3c..20f0814 100644
> --- a/libavcodec/x86/h264_deblock.asm
> +++ b/libavcodec/x86/h264_deblock.asm
> @@ -864,7 +864,47 @@ ff_chroma_inter_body_mmxext:
> DEBLOCK_P0_Q0
> ret
>
> +cglobal h264_h_loop_filter_chroma422_8, 5, 7, 8, mmsize + ARCH_X86_64*2*mmsize
This will not work with x86_32 compilers that don't have aligned stack (Like msvc)
because r6 is needed to store the stack pointer.
> + %if ARCH_X86_64
> + %define buf0 [rsp+16]
> + %define buf1 [rsp+8]
> + %else
> + %define buf0 r0m
> + %define buf1 r2m
> + %endif
> +
> + movd m6, [r4]
Since r4 is free after this point, you can use it instead of r6 to easily solve
the above.
More information about the ffmpeg-devel
mailing list