[FFmpeg-devel] [PATCH] modification of the MMX H264 MC chroma functions to support RV40
Mathieu Velten
matmaul
Tue Dec 23 00:40:18 CET 2008
2008/12/22 Michael Niedermayer <michaelni at gmx.at>:
> how much does it slow h.264 down?
> please put START/STOP_TIMER surrounding the call to the MC code
> and tell us the results.
sorry I am not familiar with this method so I will just paste the results.
if you could explain to me what the results means...
for mc8_rnd :
without :
57080 dezicycles in mc8_rnd, 1 runs, 0 skips
31905 dezicycles in mc8_rnd, 2 runs, 0 skips
16360 dezicycles in mc8_rnd, 4 runs, 0 skips
9010 dezicycles in mc8_rnd, 8 runs, 0 skips
4840 dezicycles in mc8_rnd, 16 runs, 0 skips
2752 dezicycles in mc8_rnd, 32 runs, 0 skips
1716 dezicycles in mc8_rnd, 64 runs, 0 skips
1299 dezicycles in mc8_rnd, 128 runs, 0 skips
1031 dezicycles in mc8_rnd, 256 runs, 0 skips
983 dezicycles in mc8_rnd, 512 runs, 0 skips
1092 dezicycles in mc8_rnd, 1024 runs, 0 skips
1002 dezicycles in mc8_rnd, 2048 runs, 0 skips
937 dezicycles in mc8_rnd, 4095 runs, 1 skips
915 dezicycles in mc8_rnd, 8191 runs, 1 skips
986 dezicycles in mc8_rnd, 16381 runs, 3 skips
990 dezicycles in mc8_rnd, 32765 runs, 3 skips
948 dezicycles in mc8_rnd, 65531 runs, 5 skips
989 dezicycles in mc8_rnd, 131049 runs, 23 skips
1366 dezicycles in mc8_rnd, 262012 runs, 132 skips
1578 dezicycles in mc8_rnd, 523947 runs, 341 skips30 bitrate= -0.0kbits/s
1720 dezicycles in mc8_rnd, 1047798 runs, 778 skips80 bitrate= -0.0kbits/s
1962 dezicycles in mc8_rnd, 2094856 runs, 2296 skips2 bitrate= -0.0kbits/s
2100 dezicycles in mc8_rnd, 4188963 runs, 5341 skips1 bitrate= -0.0kbits/s
with :
11620 dezicycles in mc8_rnd, 1 runs, 0 skips
8350 dezicycles in mc8_rnd, 2 runs, 0 skips
5497 dezicycles in mc8_rnd, 4 runs, 0 skips
3081 dezicycles in mc8_rnd, 8 runs, 0 skips
1871 dezicycles in mc8_rnd, 16 runs, 0 skips
1266 dezicycles in mc8_rnd, 32 runs, 0 skips
961 dezicycles in mc8_rnd, 64 runs, 0 skips
895 dezicycles in mc8_rnd, 128 runs, 0 skips
857 dezicycles in mc8_rnd, 256 runs, 0 skips
831 dezicycles in mc8_rnd, 512 runs, 0 skips
965 dezicycles in mc8_rnd, 1024 runs, 0 skips
915 dezicycles in mc8_rnd, 2048 runs, 0 skips
887 dezicycles in mc8_rnd, 4095 runs, 1 skips
888 dezicycles in mc8_rnd, 8190 runs, 2 skips
958 dezicycles in mc8_rnd, 16382 runs, 2 skips
945 dezicycles in mc8_rnd, 32765 runs, 3 skips
927 dezicycles in mc8_rnd, 65533 runs, 3 skips
976 dezicycles in mc8_rnd, 131052 runs, 20 skips
1351 dezicycles in mc8_rnd, 262032 runs, 112 skips
1572 dezicycles in mc8_rnd, 523987 runs, 301 skips34 bitrate= -0.0kbits/s
1720 dezicycles in mc8_rnd, 1047854 runs, 722 skips01 bitrate= -0.0kbits/s
1967 dezicycles in mc8_rnd, 2095038 runs, 2114 skips7 bitrate= -0.0kbits/s
2105 dezicycles in mc8_rnd, 4189391 runs, 4913 skips1 bitrate= -0.0kbits/s
for mc4 :
with :
9360 dezicycles in mc4, 1 runs, 0 skips
6735 dezicycles in mc4, 2 runs, 0 skips
3962 dezicycles in mc4, 4 runs, 0 skips
3265 dezicycles in mc4, 8 runs, 0 skips
2655 dezicycles in mc4, 16 runs, 0 skips
2317 dezicycles in mc4, 32 runs, 0 skips
2374 dezicycles in mc4, 64 runs, 0 skips
2111 dezicycles in mc4, 128 runs, 0 skips
1831 dezicycles in mc4, 256 runs, 0 skips
1945 dezicycles in mc4, 512 runs, 0 skips
1882 dezicycles in mc4, 1024 runs, 0 skips
1850 dezicycles in mc4, 2048 runs, 0 skips
1738 dezicycles in mc4, 4096 runs, 0 skips
1658 dezicycles in mc4, 8192 runs, 0 skips
1605 dezicycles in mc4, 16384 runs, 0 skips
1597 dezicycles in mc4, 32768 runs, 0 skipstime=4.75 bitrate= -0.0kbits/s
1564 dezicycles in mc4, 65534 runs, 2 skips
1562 dezicycles in mc4, 131069 runs, 3 skipsime=8.59 bitrate= -0.0kbits/s
1551 dezicycles in mc4, 262138 runs, 6 skipsime=12.18 bitrate= -0.0kbits/s
1575 dezicycles in mc4, 524280 runs, 8 skipsime=26.28 bitrate= -0.0kbits/s
1583 dezicycles in mc4, 1048564 runs, 12 skipse=54.18 bitrate= -0.0kbits/s
without :
11350 dezicycles in mc4, 1 runs, 0 skips
6585 dezicycles in mc4, 2 runs, 0 skips
4682 dezicycles in mc4, 4 runs, 0 skips
5298 dezicycles in mc4, 8 runs, 0 skips
3505 dezicycles in mc4, 16 runs, 0 skips
2822 dezicycles in mc4, 32 runs, 0 skips
2672 dezicycles in mc4, 64 runs, 0 skips
2329 dezicycles in mc4, 128 runs, 0 skips
1953 dezicycles in mc4, 256 runs, 0 skips
1997 dezicycles in mc4, 512 runs, 0 skips
1941 dezicycles in mc4, 1024 runs, 0 skips
1906 dezicycles in mc4, 2048 runs, 0 skips
1763 dezicycles in mc4, 4095 runs, 1 skips
1675 dezicycles in mc4, 8191 runs, 1 skips
1603 dezicycles in mc4, 16382 runs, 2 skips
1589 dezicycles in mc4, 32766 runs, 2 skipstime=4.50 bitrate= -0.0kbits/s
1554 dezicycles in mc4, 65532 runs, 4 skips
1554 dezicycles in mc4, 131058 runs, 14 skipsme=8.22 bitrate= -0.0kbits/s
1539 dezicycles in mc4, 262117 runs, 27 skipsme=14.01 bitrate= -0.0kbits/s
1564 dezicycles in mc4, 524227 runs, 61 skipsme=27.07 bitrate= -0.0kbits/s
1571 dezicycles in mc4, 1048452 runs, 124 skips=53.72 bitrate= -0.0kbits/s
>> @@ -45,17 +46,16 @@
>> /* 1 dimensional filter only */
>> const int dxy = x ? 1 : stride;
>>
>> - rnd_reg = rnd ? &ff_pw_4 : &ff_pw_3;
>> -
>> __asm__ volatile(
>> "movd %0, %%mm5\n\t"
>> "movq %1, %%mm4\n\t"
>> - "movq %2, %%mm6\n\t" /* mm6 = rnd */
>> + "movq %2, %%mm6\n\t"
>
>> + "psrlw $3, %%mm6\n\t" /* mm6 = bias >> 3 */
>
> is this a useless instruction that can be merged into the table?
>
I can do it in C (bias_reg = ff_pw_tab[bias>>3]) instead of shift the
mmx register itself but I'm not sure we will gain in performance.
> [...]
>> Index: libavcodec/x86/h264dsp_mmx.c
>> ===================================================================
>> --- libavcodec/x86/h264dsp_mmx.c (revision 16270)
>> +++ libavcodec/x86/h264dsp_mmx.c (working copy)
>> @@ -19,6 +19,7 @@
>> */
>>
>> #include "dsputil_mmx.h"
>> +#include "libavcodec/rv40data.h"
>
> duplicating lots of tables ...
>
then I could just copy the small rv40_bias table I need in dsputil_mmx.c
or I could create a rv40data.c file and use extern, as you want.
cleaner patch attached with rv40_bias copied in dsputil_mmx.c
Mathieu Velten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rv40_mc_mmx_v2.diff
Type: text/x-diff
Size: 11805 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081223/a0859a86/attachment.diff>
More information about the ffmpeg-devel
mailing list