[FFmpeg-devel] [PATCH/WIP] avcodec/x86: move simple_idct to external assembly
Ronald S. Bultje
rsbultje at gmail.com
Thu May 18 20:13:57 EEST 2017
Hi,
cursory review, I appreciate that this is work in progress so don't read
too much into it.
General points:
- I really like how you're doing a literal translation without trying to
change it, this makes review easier and also reduces the likelihood that
there's performance implications.
- do you think a checkasm test makes sense? That would also make
performance measuring easier.
Code review:
On Fri, May 12, 2017 at 10:27 AM, James Darnley <jdarnley at obe.tv> wrote:
> +; Simple IDCT MMX
> +;
> +; Copyright (c) 2001, 2002 Michael Niedermayer <michaelni at gmx.at>
>
Please add a line that you converted it from inline asm to x264asm syntax,
we've done that in other places also.
> +%macro DC_COND_IDCT 7
> +movq mm0, [blockq + %1] ; R4 R0 r4 r0
> +movq mm1, [blockq + %2] ; R6 R2 r6 r2
> +movq mm2, [blockq + %3] ; R3 R1 r3 r1
> +movq mm3, [blockq + %4] ; R7 R5 r7 r5
>
Please use 4-space indentation.
> + %%9:
> +; :: "r" (block), "r" (temp), "r" (coeffs)
> +; NAMED_CONSTRAINTS_ADD(wm1010,d40000)
> +; : "%eax"
> +%endmacro
>
The inline asm bits (middle 3 lines) can be removed (yay!).
Rest is fine. I am assuming that the binary size will grow slightly because
the idct is now inlined in the put/add (and duplicated between mmx/sse2),
but no performance implication (or possibly slight improvement because of
the inline). If that's correct, I'm OK with this.
Are you planning to write an actual SSE2 version of the IDCT? (No pressure,
just wondering.)
Ronald
More information about the ffmpeg-devel
mailing list