[FFmpeg-devel] [PATCH] [WIP] dnxhdenc: get_pixels_8x4_sym_10bit_sse2
Ronald S. Bultje
rsbultje at gmail.com
Wed Apr 9 14:48:02 CEST 2014
Hi,
On Tue, Apr 8, 2014 at 11:51 PM, Timothy Gu <timothygu99 at gmail.com> wrote:
> On Tue, Apr 8, 2014 at 8:42 PM, Timothy Gu <timothygu99 at gmail.com> wrote:
> > Before:
> > 3383 decicycles in dnxhd_10bit_get_pixels_8x4_sym, 130910 runs, 162 skips
> > After:
> > 750 decicycles in ff_get_pixels_8x4_sym_10bit_sse2, 130999 runs, 73 skips
> >
> > Overall performance impact negligible.
> >
> > Signed-off-by: Timothy Gu <timothygu99 at gmail.com>
> > ---
> > libavcodec/x86/dnxhdenc.asm | 41
> +++++++++++++++++++++++++++++------------
> > libavcodec/x86/dnxhdenc_init.c | 4 ++++
> > 2 files changed, 33 insertions(+), 12 deletions(-)
> >
> > diff --git a/libavcodec/x86/dnxhdenc.asm b/libavcodec/x86/dnxhdenc.asm
> > index 9dd6d51..d42530b 100644
> > --- a/libavcodec/x86/dnxhdenc.asm
> > +++ b/libavcodec/x86/dnxhdenc.asm
> > @@ -26,18 +26,30 @@ section .text
> >
> > ; void get_pixels_8x4_sym_sse2(int16_t *block, const uint8_t *pixels,
> > ; ptrdiff_t line_size)
> > -INIT_XMM sse2
> > -cglobal get_pixels_8x4_sym, 3,3,5, block, pixels, linesize
> > - pxor m4, m4
> > - movq m0, [pixelsq]
> > - add pixelsq, linesizeq
> > - movq m1, [pixelsq]
> > - movq m2, [pixelsq+linesizeq]
> > - movq m3, [pixelsq+linesizeq*2]
> > - punpcklbw m0, m4
> > - punpcklbw m1, m4
> > - punpcklbw m2, m4
> > - punpcklbw m3, m4
> > +
> > +%macro GET_PIXELS 1
> > +%if %1 == 8
> > +cglobal get_pixels_8x4_sym, 3,3,5, block, pixels, linesize
> > +%elif %1 == 16
> > +cglobal get_pixels_8x4_sym_10bit, 3,3,4, block, pixels, linesize
> > +%endif
> > + %if %1 == mmsize/2
> > + pxor m4, m4
> > + %define LOAD movh
> > + %elif %1 == mmsize && %1 == 16
> > + %define LOAD movu
> > + %endif
> > + LOAD m0, [pixelsq]
> > + add pixelsq, linesizeq
> > + LOAD m1, [pixelsq]
> > + LOAD m2, [pixelsq+linesizeq]
> > + LOAD m3, [pixelsq+linesizeq*2]
>
> I probably messed up the loading and linesize here. Can someone give
> me a pointer on how to fix it?
You mean it doesn't work? What's the C code?
Ronald
More information about the ffmpeg-devel
mailing list