[FFmpeg-devel] [PATCH 2/2] x86/vf_w3fdif: simplify w3fdif_simple_high
Ronald S. Bultje
rsbultje at gmail.com
Sun Oct 11 20:11:26 CEST 2015
Hi,
On Sun, Oct 11, 2015 at 1:17 PM, James Almer <jamrial at gmail.com> wrote:
> On 10/11/2015 4:31 AM, Paul B Mahol wrote:
> > On 10/11/15, James Almer <jamrial at gmail.com> wrote:
> >> Signed-off-by: James Almer <jamrial at gmail.com>
> >> ---
> >> libavfilter/x86/vf_w3fdif.asm | 16 +++++++---------
> >> 1 file changed, 7 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/libavfilter/x86/vf_w3fdif.asm
> b/libavfilter/x86/vf_w3fdif.asm
> >> index f02319b..f2001a4 100644
> >> --- a/libavfilter/x86/vf_w3fdif.asm
> >> +++ b/libavfilter/x86/vf_w3fdif.asm
> >> @@ -103,13 +103,11 @@ REP_RET
> >>
> >> %if ARCH_X86_64
> >>
> >> -cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line, in_lines_cur0,
> >> in_lines_adj0, coef, linesize
> >> +cglobal w3fdif_simple_high, 5, 9, 8, 0, work_line, in_lines_cur0,
> >> in_lines_adj0, coef, linesize
> >> movq m2, [coefq]
> >> DEFINE_ARGS work_line, in_lines_cur0, in_lines_adj0,
> in_lines_cur1,
> >> linesize, offset, in_lines_cur2, in_lines_adj1, in_lines_adj2
> >> - SPLATW m0, m2, 0
> >> - SPLATW m1, m2, 1
> >> + pshufd m0, m2, q0000
> >> SPLATW m2, m2, 2
> >> - SBUTTERFLY wd, 0, 1, 7
> >> pxor m7, m7
> >> mov offsetq, 0
> >> mov in_lines_cur2q, [in_lines_cur0q+gprsize*2]
> >> @@ -124,23 +122,23 @@ cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line,
> >> in_lines_cur0, in_lines_adj0,
> >> movh m4, [in_lines_cur1q+offsetq]
> >> punpcklbw m3, m7
> >> punpcklbw m4, m7
> >> - SBUTTERFLY wd, 3, 4, 8
> >> + SBUTTERFLY wd, 3, 4, 1
> >> pmaddwd m3, m0
> >> - pmaddwd m4, m1
> >> + pmaddwd m4, m0
> >> movh m5, [in_lines_adj0q+offsetq]
> >> movh m6, [in_lines_adj1q+offsetq]
> >> punpcklbw m5, m7
> >> punpcklbw m6, m7
> >> - SBUTTERFLY wd, 5, 6, 8
> >> + SBUTTERFLY wd, 5, 6, 1
> >> pmaddwd m5, m0
> >> - pmaddwd m6, m1
> >> + pmaddwd m6, m0
> >> paddd m3, m5
> >> paddd m4, m6
> >> movh m5, [in_lines_cur2q+offsetq]
> >> movh m6, [in_lines_adj2q+offsetq]
> >> punpcklbw m5, m7
> >> punpcklbw m6, m7
> >> - SBUTTERFLY wd, 5, 6, 8
> >> + SBUTTERFLY wd, 5, 6, 1
> >> pmaddwd m5, m2
> >> pmaddwd m6, m2
> >> paddd m3, m5
> >> --
> >> 2.6.0
> >>
> >> _______________________________________________
> >> ffmpeg-devel mailing list
> >> ffmpeg-devel at ffmpeg.org
> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >>
> >
> > Cant this now be used on x32?
>
Add to the data pointers directly (in_lines_cur0q and work_lineq). Then sub
all other curXq/adjXq from cur0q (on 32bit only) before the loop and you
have to adds (on 32bit) instead of one (on 64bit), but one reg less
(offset), making it 7, which means it works.
Ronald
More information about the ffmpeg-devel
mailing list