[FFmpeg-devel] [PATCH] avfilter/avf_showcqt: cqt_calc optimization on x86
Muhammad Faiz
mfcc64 at gmail.com
Wed Jun 8 02:28:41 CEST 2016
On Tue, Jun 7, 2016 at 4:18 PM, Muhammad Faiz <mfcc64 at gmail.com> wrote:
> On Tue, Jun 7, 2016 at 10:36 AM, James Almer <jamrial at gmail.com> wrote:
>> On 6/4/2016 4:36 AM, Muhammad Faiz wrote:
>>> benchmark on x86_64
>>> cqt_time:
>>> plain = 3.292 s
>>> SSE = 1.640 s
>>> SSE3 = 1.631 s
>>> AVX = 1.395 s
>>> FMA3 = 1.271 s
>>> FMA4 = not available
>>
>> Try using the START_TIMER and STOP_TIMER macros to wrap the s->cqt_calc
>> call in libavfilter/avf_showcqt.c
>> It will potentially give more accurate results than the current
>> UPDATE_TIME(s->cqt_time) check.
>>
> OK, but probably I will check it privately (not sending patch)
>
plain:
2339760 decicycles in cqt_calc, 1 runs, 0 skips
2305160 decicycles in cqt_calc, 2 runs, 0 skips
2248260 decicycles in cqt_calc, 4 runs, 0 skips
2211985 decicycles in cqt_calc, 8 runs, 0 skips
2195152 decicycles in cqt_calc, 16 runs, 0 skips
2188133 decicycles in cqt_calc, 32 runs, 0 skips
2182856 decicycles in cqt_calc, 64 runs, 0 skips
2182876 decicycles in cqt_calc, 128 runs, 0 skips
2178021 decicycles in cqt_calc, 256 runs, 0 skips
2178197 decicycles in cqt_calc, 512 runs, 0 skips
2173667 decicycles in cqt_calc, 1024 runs, 0 skips
2175272 decicycles in cqt_calc, 2048 runs, 0 skips
2171456 decicycles in cqt_calc, 4096 runs, 0 skips
2169706 decicycles in cqt_calc, 8192 runs, 0 skips
2169493 decicycles in cqt_calc, 16384 runs, 0 skips
sse:
1432400 decicycles in cqt_calc, 1 runs, 0 skips
1413420 decicycles in cqt_calc, 2 runs, 0 skips
1340840 decicycles in cqt_calc, 4 runs, 0 skips
1240880 decicycles in cqt_calc, 8 runs, 0 skips
1175592 decicycles in cqt_calc, 16 runs, 0 skips
1155657 decicycles in cqt_calc, 32 runs, 0 skips
1157220 decicycles in cqt_calc, 64 runs, 0 skips
1132563 decicycles in cqt_calc, 128 runs, 0 skips
1121175 decicycles in cqt_calc, 256 runs, 0 skips
1112374 decicycles in cqt_calc, 512 runs, 0 skips
1109323 decicycles in cqt_calc, 1024 runs, 0 skips
1102490 decicycles in cqt_calc, 2048 runs, 0 skips
1098801 decicycles in cqt_calc, 4096 runs, 0 skips
1100257 decicycles in cqt_calc, 8192 runs, 0 skips
1101172 decicycles in cqt_calc, 16384 runs, 0 skips
sse3:
1612720 decicycles in cqt_calc, 1 runs, 0 skips
1539780 decicycles in cqt_calc, 2 runs, 0 skips
1398232 decicycles in cqt_calc, 4 runs, 0 skips
1331866 decicycles in cqt_calc, 8 runs, 0 skips
1262878 decicycles in cqt_calc, 16 runs, 0 skips
1538833 decicycles in cqt_calc, 32 runs, 0 skips
1384517 decicycles in cqt_calc, 64 runs, 0 skips
1246595 decicycles in cqt_calc, 128 runs, 0 skips
1178879 decicycles in cqt_calc, 256 runs, 0 skips
1120117 decicycles in cqt_calc, 512 runs, 0 skips
1092902 decicycles in cqt_calc, 1024 runs, 0 skips
1077479 decicycles in cqt_calc, 2048 runs, 0 skips
1069110 decicycles in cqt_calc, 4096 runs, 0 skips
1067095 decicycles in cqt_calc, 8192 runs, 0 skips
1066812 decicycles in cqt_calc, 16383 runs, 1 skips
avx:
1333000 decicycles in cqt_calc, 1 runs, 0 skips
1261940 decicycles in cqt_calc, 2 runs, 0 skips
1082250 decicycles in cqt_calc, 4 runs, 0 skips
1036575 decicycles in cqt_calc, 8 runs, 0 skips
977935 decicycles in cqt_calc, 16 runs, 0 skips
950680 decicycles in cqt_calc, 32 runs, 0 skips
950307 decicycles in cqt_calc, 64 runs, 0 skips
959265 decicycles in cqt_calc, 128 runs, 0 skips
943070 decicycles in cqt_calc, 256 runs, 0 skips
931758 decicycles in cqt_calc, 512 runs, 0 skips
929080 decicycles in cqt_calc, 1023 runs, 1 skips
923407 decicycles in cqt_calc, 2046 runs, 2 skips
918616 decicycles in cqt_calc, 4094 runs, 2 skips
917359 decicycles in cqt_calc, 8189 runs, 3 skips
916981 decicycles in cqt_calc, 16379 runs, 5 skips
fma3:
1050200 decicycles in cqt_calc, 1 runs, 0 skips
1019680 decicycles in cqt_calc, 2 runs, 0 skips
969420 decicycles in cqt_calc, 4 runs, 0 skips
945985 decicycles in cqt_calc, 8 runs, 0 skips
905312 decicycles in cqt_calc, 16 runs, 0 skips
964126 decicycles in cqt_calc, 32 runs, 0 skips
1041993 decicycles in cqt_calc, 64 runs, 0 skips
969205 decicycles in cqt_calc, 128 runs, 0 skips
917490 decicycles in cqt_calc, 256 runs, 0 skips
885880 decicycles in cqt_calc, 512 runs, 0 skips
867781 decicycles in cqt_calc, 1024 runs, 0 skips
852242 decicycles in cqt_calc, 2048 runs, 0 skips
844318 decicycles in cqt_calc, 4096 runs, 0 skips
839100 decicycles in cqt_calc, 8191 runs, 1 skips
836639 decicycles in cqt_calc, 16383 runs, 1 skips
Thank's
More information about the ffmpeg-devel
mailing list