[FFmpeg-devel] [PATCH] Altivec version of h264_idct_add

David Conrad umovimus
Sun Jun 3 06:00:05 CEST 2007


On Jun 2, 2007, at 10:15 PM, Luca Barbato wrote:

> Loren Merritt wrote:
>>
>> The switch could be changed to a table if it matters.
>
> In theory vec_ste is all we need here sadly, I cannot manage to get it
> working right for the unaligned cases.

I've never really looked at vec_ste before today, but it seems that  
vec_ste will always write the first element of the vector to the  
rounded-down 16-byte address, and to store to an unaligned address  
you have to move the data in the vector and store that element. The  
attached patch does this with a permute and uses it instead of the  
switch. It requires an additional 4 permutes and constant vector the  
aligned case, but it seems to be a bit faster overall on my G4.

230 dezicycles in ff_h264_idct_add_altivec, 1 runs, 0 skips
165 dezicycles in ff_h264_idct_add_altivec, 2 runs, 0 skips
105 dezicycles in ff_h264_idct_add_altivec, 4 runs, 0 skips
75 dezicycles in ff_h264_idct_add_altivec, 8 runs, 0 skips
44 dezicycles in ff_h264_idct_add_altivec, 16 runs, 0 skips
29 dezicycles in ff_h264_idct_add_altivec, 32 runs, 0 skips
20 dezicycles in ff_h264_idct_add_altivec, 64 runs, 0 skips
16 dezicycles in ff_h264_idct_add_altivec, 128 runs, 0 skips
14 dezicycles in ff_h264_idct_add_altivec, 256 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 512 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 1024 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 2048 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 4096 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 8192 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 16384 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 32768 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 65534 runs, 2 skips
13 dezicycles in ff_h264_idct_add_altivec, 131069 runs, 3 skips
13 dezicycles in ff_h264_idct_add_altivec, 262139 runs, 5 skips
14 dezicycles in ff_h264_idct_add_altivec, 524269 runs, 19 skips
14 dezicycles in ff_h264_idct_add_altivec, 1048546 runs, 30 skips

210 dezicycles in ff_h264_idct_add_altivec_ste, 1 runs, 0 skips
165 dezicycles in ff_h264_idct_add_altivec_ste, 2 runs, 0 skips
102 dezicycles in ff_h264_idct_add_altivec_ste, 4 runs, 0 skips
61 dezicycles in ff_h264_idct_add_altivec_ste, 8 runs, 0 skips
37 dezicycles in ff_h264_idct_add_altivec_ste, 16 runs, 0 skips
26 dezicycles in ff_h264_idct_add_altivec_ste, 32 runs, 0 skips
19 dezicycles in ff_h264_idct_add_altivec_ste, 64 runs, 0 skips
15 dezicycles in ff_h264_idct_add_altivec_ste, 128 runs, 0 skips
14 dezicycles in ff_h264_idct_add_altivec_ste, 256 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 512 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 1024 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 2048 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 4096 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 8192 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 16383 runs, 1 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 32766 runs, 2 skips
13 dezicycles in ff_h264_idct_add_altivec_ste, 65533 runs, 3 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 131068 runs, 4 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 262138 runs, 6 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 524274 runs, 14 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 1048544 runs, 32 skips

230 dezicycles in ff_h264_idct_add_altivec, 1 runs, 0 skips
150 dezicycles in ff_h264_idct_add_altivec, 2 runs, 0 skips
97 dezicycles in ff_h264_idct_add_altivec, 4 runs, 0 skips
71 dezicycles in ff_h264_idct_add_altivec, 8 runs, 0 skips
44 dezicycles in ff_h264_idct_add_altivec, 16 runs, 0 skips
30 dezicycles in ff_h264_idct_add_altivec, 32 runs, 0 skips
21 dezicycles in ff_h264_idct_add_altivec, 64 runs, 0 skips
17 dezicycles in ff_h264_idct_add_altivec, 128 runs, 0 skips
15 dezicycles in ff_h264_idct_add_altivec, 256 runs, 0 skips
14 dezicycles in ff_h264_idct_add_altivec, 512 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 1024 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 2048 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 4096 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 8192 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec, 16384 runs, 0 skips
14 dezicycles in ff_h264_idct_add_altivec, 32768 runs, 0 skips
14 dezicycles in ff_h264_idct_add_altivec, 65534 runs, 2 skips
14 dezicycles in ff_h264_idct_add_altivec, 131068 runs, 4 skips
13 dezicycles in ff_h264_idct_add_altivec, 262135 runs, 9 skips
14 dezicycles in ff_h264_idct_add_altivec, 524270 runs, 18 skips
14 dezicycles in ff_h264_idct_add_altivec, 1048544 runs, 32 skips

200 dezicycles in ff_h264_idct_add_altivec_ste, 1 runs, 0 skips
140 dezicycles in ff_h264_idct_add_altivec_ste, 2 runs, 0 skips
95 dezicycles in ff_h264_idct_add_altivec_ste, 4 runs, 0 skips
62 dezicycles in ff_h264_idct_add_altivec_ste, 8 runs, 0 skips
38 dezicycles in ff_h264_idct_add_altivec_ste, 16 runs, 0 skips
27 dezicycles in ff_h264_idct_add_altivec_ste, 32 runs, 0 skips
20 dezicycles in ff_h264_idct_add_altivec_ste, 64 runs, 0 skips
15 dezicycles in ff_h264_idct_add_altivec_ste, 128 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec_ste, 256 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 512 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 1024 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 2048 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 4096 runs, 0 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 8192 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec_ste, 16384 runs, 0 skips
13 dezicycles in ff_h264_idct_add_altivec_ste, 32767 runs, 1 skips
13 dezicycles in ff_h264_idct_add_altivec_ste, 65535 runs, 1 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 131069 runs, 3 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 262138 runs, 6 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 524276 runs, 12 skips
12 dezicycles in ff_h264_idct_add_altivec_ste, 1048553 runs, 23 skips

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: h264_idct_add_altivec_ste.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070603/e9d7ba93/attachment.txt>



More information about the ffmpeg-devel mailing list