[FFmpeg-devel] MMX version for put_no_rnd_h264_chroma_mc8_c

Sat Nov 24 18:39:27 CET 2007

On Mon, Nov 19, 2007 at 10:12:15AM +0100, Guillaume Poirier wrote:
> Hello,
> 
> Christophe GISQUET wrote:
> > Michael Niedermayer a ?crit :
> >> less code duplication is preferred unless that would cause a speed loss ...
> >> if it does, i dont know, but i wont accept duplicating code without
> >> someone providing benchmark scores showing its faster
> > 
> > The function with rounding is located in dsputil_h264_template_mmx.c
> > As a consequence, it's templated, and only one version is of interest.
> > 
> > The attached patch is a demonstration of what I'm doing to allow not to
> > round. I haven't thought yet on how to do it more cleanly...
> > Benchmarking the old and new codes on an h264 sequence yield no difference.
> 
> Now the same without useless re-indentation.
> Please try to address these kind of minor issues. They do make
> reviewing a lot easier.
> 
> Guillaume

> diff --git a/libavcodec/i386/dsputil_h264_template_mmx.c b/libavcodec/i386/dsputil_h264_template_mmx.c
> index e3da92f..fa64212 100644
> --- a/libavcodec/i386/dsputil_h264_template_mmx.c
> +++ b/libavcodec/i386/dsputil_h264_template_mmx.c
> @@ -25,8 +25,10 @@
>   * H264_CHROMA_OP must be defined to empty for put and pavgb/pavgusb for avg
>   * H264_CHROMA_MC8_MV0 must be defined to a (put|avg)_pixels8 function
>   */
> -static void H264_CHROMA_MC8_TMPL(uint8_t *dst/*align 8*/, uint8_t *src/*align 1*/, int stride, int h, int x, int y)
> +static void H264_CHROMA_MC8_TMPL(uint8_t *dst/*align 8*/, uint8_t *src/*align 1*/, int stride, int h, int x, int y, int rnd)
>  {
> +    DECLARE_ALIGNED_8(static const uint64_t, ff_pw_28) = 0x001C001C001C001CULL;
> +    const uint64_t *rnd_reg = (rnd) ? &ff_pw_32 : &ff_pw_28;
>      DECLARE_ALIGNED_8(uint64_t, AA);
>      DECLARE_ALIGNED_8(uint64_t, DD);
>      int i;
> @@ -41,19 +43,20 @@ static void H264_CHROMA_MC8_TMPL(uint8_t *dst/*align 8*/, uint8_t *src/*align 1*
>  
>      if(y==0 || x==0)
>      {
> +        //START_TIMER
>          /* 1 dimensional filter only */

benchmarking should be performed over the whole function, that is
START_TIMER
myfunc() (that is the code which calls the function)
STOP_TIMER

also the //START_TIMER dont belong in the patch


>          const int dxy = x ? 1 : stride;
>  
>          asm volatile(
> +            "movq %2, %%mm6\n\t"
>              "movd %0, %%mm5\n\t"
>              "movq %1, %%mm4\n\t"
>              "punpcklwd %%mm5, %%mm5\n\t"
>              "punpckldq %%mm5, %%mm5\n\t" /* mm5 = B = x */
> -            "movq %%mm4, %%mm6\n\t"
>              "pxor %%mm7, %%mm7\n\t"
>              "psubw %%mm5, %%mm4\n\t"     /* mm4 = A = 8-x */
> -            "psrlw $1, %%mm6\n\t"        /* mm6 = 4 */
> -            :: "rm"(x+y), "m"(ff_pw_8));
> +            "psrlw $3, %%mm6" /* mm6 = rnd */
> +            :: "rm"(x+y), "m"(ff_pw_8), "m"(*rnd_reg));

the psrlw can be avoided by shifting the constant right


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071124/7f24112c/attachment.pgp>