[FFmpeg-devel] [PATCH] vf_interlace: Add SIMD for lowpass filter
James Almer
jamrial at gmail.com
Mon Nov 10 23:04:23 CET 2014
On 10/11/14 6:42 PM, Kieran Kunhya wrote:
Can't test since it doesn't apply cleanly, but here are a few comments anyway.
> diff --git a/libavfilter/x86/vf_interlace.asm b/libavfilter/x86/vf_interlace.asm
> new file mode 100644
> index 0000000..40b10fc
> --- /dev/null
> +++ b/libavfilter/x86/vf_interlace.asm
> @@ -0,0 +1,80 @@
> +;*****************************************************************************
> +;* x86-optimized functions for interlace filter
> +;*
> +;* Copyright (C) 2014 Kieran Kunhya <kierank at obe.tv>
> +;*
> +;* This file is part of Libav.
> +;*
> +;* Libav is free software; you can redistribute it and/or modify
> +;* it under the terms of the GNU General Public License as published by
> +;* the Free Software Foundation; either version 2 of the License, or
> +;* (at your option) any later version.
> +;*
> +;* Libav is distributed in the hope that it will be useful,
> +;* but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +;* GNU General Public License for more details.
> +;*
> +;* You should have received a copy of the GNU General Public License along
> +;* with Libav; if not, write to the Free Software Foundation, Inc.,
> +;* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
> +;******************************************************************************
> +
> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION_RODATA
> +
> +pw_1: times 8 dw 1
> +
> +SECTION .text
> +
> +%macro LOWPASS_LINE 0
> +cglobal lowpass_line, 5, 5
You're using m6, you need to declare 7 xmm regs.
Also, naming the regs would be better than using r*.
> + add r0, r1
> + add r2, r1
> + add r3, r1
> + add r4, r1
> + neg r1
> +
> + pxor m6, m6
> +
> +.loop
> + mova m0, [r2+r1]
> + punpcklbw m1, m0, m6
> + punpckhbw m0, m6
> + psllw m0, 1
> + psllw m1, 1
> +
> + mova m2, [r3+r1]
> + punpcklbw m3, m2, m6
> + punpckhbw m2, m6
> +
> + mova m4, [r4+r1]
> + punpcklbw m5, m4, m6
> + punpckhbw m4, m6
> +
> + paddw m1, m3
> + paddw m1, m5
> +
> + paddw m0, m2
> + paddw m0, m4
> +
> + paddw m0, [pw_1]
> + paddw m1, [pw_1]
> +
> + psrlw m0, 2
> + psrlw m1, 2
Can't pavgw be used here?
> +
> + packuswb m1, m0
> + mova [r0+r1], m1
> +
> + add r1, mmsize
> + jl .loop
> +REP_RET
> +%endmacro
> +
> +INIT_XMM sse2
> +LOWPASS_LINE
> +
> +INIT_XMM avx
> +LOWPASS_LINE
More information about the ffmpeg-devel
mailing list