[FFmpeg-devel] Some IWMMXT functions for libavcodec

Michael Niedermayer michaelni
Mon May 12 19:47:58 CEST 2008


On Mon, May 12, 2008 at 08:05:01PM +0400, Dmitry Antipov wrote:
> Hello,
>
> here are some libavcodec DSP stuff I've developed for XScale CPU with Intel 
> WMMX support.
>
> (At http://78.153.153.8/tmp/dspwmmx.c, there is also a small standalone 
> validation & benchmark
> program for these functions).

Please post the benchmarks results to the list as well ...


[...]
> +static int vsad_intra16_iwmmxt(void *c, uint8_t *pix, uint8_t *dummy, int stride, int h)
> +{
> +    int s, i;
> +
> +    for (s = 0, i = 1; i < h; i++) {
> +	asm volatile("wldrd wr0, [%1]       \n\t"
> +		     "wldrd wr1, [%2]       \n\t"
> +		     "wsadbz wr1, wr0, wr1  \n\t"
> +		     "wldrd wr0, [%1, #8]   \n\t"
> +		     "wldrd wr2, [%2, #8]   \n\t"
> +		     "wsadbz wr2, wr0, wr2  \n\t"
> +		     "waddw wr1, wr1, wr2   \n\t"
> +		     "textrmsw r1, wr1, #0  \n\t"
> +		     "add %0, %0, r1        \n\t"
> +		     : "+r"(s)
> +		     : "r"(pix), "r"(pix + stride)
> +		     : "r1");
> +	pix += stride;
> +    }

doing loops in C around asm like that is inefficient


[...]
> +static int pix_abs8_y2_iwmmxt(void *v, uint8_t *pix1, uint8_t *pix2, int line_size, int h)
> +{
> +    int s, i;
> +

> +    for (s = 0, i = 0; i < h; i++) {
> +	asm volatile("wldrd wr0, [%2]       \n\t"
> +		     "wldrd wr1, [%3]       \n\t"

i dont know the wmmx instructions either but i do know that one of
the loads is redudnant
Please keep in mind that a single suboptimal instruction means that
a patch is rejected! So please try to write optimal code even if
there is noone around who knows the instruction set


> +		     "wavg2br wr0, wr0, wr1 \n\t"
> +		     "wldrd wr1, [%1]       \n\t"
> +		     "wsadbz wr1, wr1, wr0  \n\t"

> +		     "textrmsw r1, wr1, #0  \n\t"
> +		     "add %0, %0, r1        \n\t"

Is this faster than a waddw ?


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The worst form of inequality is to try to make unequal things equal.
-- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080512/9eba59fe/attachment.pgp>



More information about the ffmpeg-devel mailing list