[FFmpeg-devel] MMX version for put_no_rnd_h264_chroma_mc8_c
Michael Niedermayer
michaelni
Sat Nov 24 18:39:27 CET 2007
On Mon, Nov 19, 2007 at 10:12:15AM +0100, Guillaume Poirier wrote:
> Hello,
>
> Christophe GISQUET wrote:
> > Michael Niedermayer a ?crit :
> >> less code duplication is preferred unless that would cause a speed loss ...
> >> if it does, i dont know, but i wont accept duplicating code without
> >> someone providing benchmark scores showing its faster
> >
> > The function with rounding is located in dsputil_h264_template_mmx.c
> > As a consequence, it's templated, and only one version is of interest.
> >
> > The attached patch is a demonstration of what I'm doing to allow not to
> > round. I haven't thought yet on how to do it more cleanly...
> > Benchmarking the old and new codes on an h264 sequence yield no difference.
>
> Now the same without useless re-indentation.
> Please try to address these kind of minor issues. They do make
> reviewing a lot easier.
>
> Guillaume
> diff --git a/libavcodec/i386/dsputil_h264_template_mmx.c b/libavcodec/i386/dsputil_h264_template_mmx.c
> index e3da92f..fa64212 100644
> --- a/libavcodec/i386/dsputil_h264_template_mmx.c
> +++ b/libavcodec/i386/dsputil_h264_template_mmx.c
> @@ -25,8 +25,10 @@
> * H264_CHROMA_OP must be defined to empty for put and pavgb/pavgusb for avg
> * H264_CHROMA_MC8_MV0 must be defined to a (put|avg)_pixels8 function
> */
> -static void H264_CHROMA_MC8_TMPL(uint8_t *dst/*align 8*/, uint8_t *src/*align 1*/, int stride, int h, int x, int y)
> +static void H264_CHROMA_MC8_TMPL(uint8_t *dst/*align 8*/, uint8_t *src/*align 1*/, int stride, int h, int x, int y, int rnd)
> {
> + DECLARE_ALIGNED_8(static const uint64_t, ff_pw_28) = 0x001C001C001C001CULL;
> + const uint64_t *rnd_reg = (rnd) ? &ff_pw_32 : &ff_pw_28;
> DECLARE_ALIGNED_8(uint64_t, AA);
> DECLARE_ALIGNED_8(uint64_t, DD);
> int i;
> @@ -41,19 +43,20 @@ static void H264_CHROMA_MC8_TMPL(uint8_t *dst/*align 8*/, uint8_t *src/*align 1*
>
> if(y==0 || x==0)
> {
> + //START_TIMER
> /* 1 dimensional filter only */
benchmarking should be performed over the whole function, that is
START_TIMER
myfunc() (that is the code which calls the function)
STOP_TIMER
also the //START_TIMER dont belong in the patch
> const int dxy = x ? 1 : stride;
>
> asm volatile(
> + "movq %2, %%mm6\n\t"
> "movd %0, %%mm5\n\t"
> "movq %1, %%mm4\n\t"
> "punpcklwd %%mm5, %%mm5\n\t"
> "punpckldq %%mm5, %%mm5\n\t" /* mm5 = B = x */
> - "movq %%mm4, %%mm6\n\t"
> "pxor %%mm7, %%mm7\n\t"
> "psubw %%mm5, %%mm4\n\t" /* mm4 = A = 8-x */
> - "psrlw $1, %%mm6\n\t" /* mm6 = 4 */
> - :: "rm"(x+y), "m"(ff_pw_8));
> + "psrlw $3, %%mm6" /* mm6 = rnd */
> + :: "rm"(x+y), "m"(ff_pw_8), "m"(*rnd_reg));
the psrlw can be avoided by shifting the constant right
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20071124/7f24112c/attachment.pgp>
More information about the ffmpeg-devel
mailing list