[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2
Michael Niedermayer
michaelni
Fri May 16 18:43:11 CEST 2008
On Fri, May 16, 2008 at 08:19:44PM +0400, Dmitry Antipov wrote:
> Siarhei Siamashka wrote:
>
> > Half of the data loaded on the second iteration of your loop has been already
> > loaded on the first iteration. It could be reused to improve performance.
> > Reusing this data can be used by unrolling loop.
>
> Argh, I see. Here two loads are avoided at the cost of having two moves:
>
> asm volatile("mov r1, %3 \n\t"
> "wzero wr0 \n\t"
> "wldrd wr1, [%1] \n\t"
> "wldrd wr2, [%1, #8] \n\t"
> "1: add %1, %1, %2 \n\t"
> "wldrd wr3, [%1] \n\t"
> "wldrd wr4, [%1, #8] \n\t"
> "wsadbz wr1, wr1, wr3 \n\t"
> "wsadbz wr2, wr2, wr4 \n\t"
> "waddw wr0, wr0, wr1 \n\t"
> "waddw wr0, wr0, wr2 \n\t"
> "wmov wr1, wr3 \n\t"
> "wmov wr2, wr4 \n\t"
> "subs r1, r1, #1 \n\t"
> "bne 1b \n\t"
> "textrmsw %0, wr0, #0 \n\t"
> : "=r"(s), "+r"(pix)
> : "r"(stride), "r"(h - 1)
> : "r1");
>
> As for unrolling, I don't believe it's a good idea here since the number of
> iterations of outer loop isn't known. Here is an unrolled version:
the iterations are always an even number IIRC, but dont hesitate to add a
assert(!(h&1));
>
> asm volatile("mov r1, %3 \n\t"
> "wzero wr0 \n\t"
> "wldrd wr1, [%1] \n\t"
> "wldrd wr2, [%1, #8] \n\t"
> "1: add %1, %1, %2 \n\t"
> "wldrd wr3, [%1] \n\t"
> "wldrd wr4, [%1, #8] \n\t"
> "wsadbz wr1, wr1, wr3 \n\t"
> "wsadbz wr2, wr2, wr4 \n\t"
> "waddw wr0, wr0, wr1 \n\t"
> "waddw wr0, wr0, wr2 \n\t"
> "subs r1, r1, #1 \n\t"
> "beq 2f \n\t"
> "add %1, %1, %2 \n\t"
> "wldrd wr5, [%1] \n\t"
> "wldrd wr6, [%1, #8] \n\t"
> "wsadbz wr3, wr3, wr5 \n\t"
> "wsadbz wr4, wr4, wr6 \n\t"
> "waddw wr0, wr0, wr3 \n\t"
> "waddw wr0, wr0, wr4 \n\t"
> "wmov wr1, wr5 \n\t"
> "wmov wr2, wr6 \n\t"
> "subs r1, r1, #1 \n\t"
> "bne 1b \n\t"
> "2: textrmsw %0, wr0, #0 \n\t"
> : "=r"(s), "+r"(pix)
> : "r"(stride), "r"(h - 1)
> : "r1");
well, now you have 2 unneeded wmov in there
>
> The granularity of performance monitoring unit's clock cycle counter
> isn't enough to see performance differences between them :-).
run the code 100 times instead of once
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080516/cc87d1c1/attachment.pgp>
More information about the ffmpeg-devel
mailing list