[FFmpeg-devel] [PATCH] Some IWMMXT functions for libavcodec #2
Dmitry Antipov
dmantipov
Wed May 21 13:46:34 CEST 2008
Siarhei Siamashka wrote:
> Please add the following implementation of "pix_sum" function to your
> benchmark set and post the results. I strongly suspect that it is a lot
> faster than any of your variants.
I've updated http://78.153.153.8/tmp/pix_sum.c and http://78.153.153.8/tmp/pix_sum.txt
(BTW, it might be offline for now due to some issues with my internet connection).
This is an extract from pix_sum.txt (PMUs - performance monitoring unit clock cycles,
[16], [32], etc. is the pix_sum line size):
...
pix_sum_iwmmxt2_last[16]: 4458 PMUs [32407]
pix_sum_iwmmxt2_last[32]: 8864 PMUs [32216]
pix_sum_iwmmxt2_last[64]: 13302 PMUs [32001]
pix_sum_iwmmxt2_last[128]: 17727 PMUs [34186]
pix_sum_iwmmxt2_last[256]: 22169 PMUs [34349]
pix_sum_iwmmxt2_last[512]: 26583 PMUs [35318]
pix_sum_iwmmxt2_last[1024]: 31030 PMUs [34941]
--
pix_sum_iwmmxt2_pipelined[16]: 4458 PMUs [32407]
pix_sum_iwmmxt2_pipelined[32]: 8899 PMUs [32216]
pix_sum_iwmmxt2_pipelined[64]: 13341 PMUs [32001]
pix_sum_iwmmxt2_pipelined[128]: 17780 PMUs [34186]
pix_sum_iwmmxt2_pipelined[256]: 22215 PMUs [34349]
pix_sum_iwmmxt2_pipelined[512]: 26652 PMUs [35318]
pix_sum_iwmmxt2_pipelined[1024]: 31090 PMUs [34941]
...
So, here is a table:
Mine Your My speedup
------------------------
4458 4458 0.0%
8899 8864 0.39%
13341 13302 0.29%
17780 17727 0.29%
22215 22169 0.2%
26652 26583 0.25%
31090 31030 0.19%
These 0.1-0.4% are marginal, but stable - few tens of runs gives an approximately
the same percents, and your's version was never faster.
As for code size, both versions contains 68 instructions.
pix_sum_iwmmxt2_last() was:
#define LOAD(x,y) \
"wldrd wr" #x ", [%1, %2]!\n\t" \
"wldrd wr" #y ", [%1, #8] \n\t" \
#define SUM4(x,y,z,t) \
LOAD(x,y) LOAD(z,t) \
"wsadb wr0, wr" #x ", wr5 \n\t" \
"wsadb wr0, wr" #y ", wr5 \n\t" \
"wsadb wr0, wr" #z ", wr5 \n\t" \
"wsadb wr0, wr" #t ", wr5 \n\t"
int pix_sum_iwmmxt2_last(uint8_t *pix, int line_size)
{
int s;
asm volatile("wldrd wr1, [%1] \n\t"
"wzero wr5 \n\t"
"wldrd wr2, [%1, #8] \n\t"
LOAD(3,4)
"wsadbz wr0, wr1, wr5 \n\t"
"wsadb wr0, wr2, wr5 \n\t"
"wsadb wr0, wr3, wr5 \n\t"
"wsadb wr0, wr4, wr5 \n\t"
SUM4(1, 2, 3, 4)
SUM4(1, 2, 3, 4)
SUM4(1, 2, 3, 4)
SUM4(1, 2, 3, 4)
SUM4(1, 2, 3, 4)
SUM4(1, 2, 3, 4)
SUM4(1, 2, 3, 4)
"textrmsw %0, wr0, #0 \n\t"
: "=r"(s), "+r"(pix)
: "r"(line_size));
return s;
}
Dmitry
More information about the ffmpeg-devel
mailing list