[FFmpeg-devel] libavcodec/blockdsp : add AVX version
James Almer
jamrial at gmail.com
Wed Oct 4 02:09:04 EEST 2017
On 10/3/2017 4:47 PM, Martin Vignali wrote:
> Hello,
>
>
>> I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but
>> not the rest.
>> Your compiler seems to have done a much better job than mine. Is it
>> Clang? Does it somehow have vectorization enabled perhaps? Because
>> that's not supposed to happen.
>>
>>
> Yes it's Clang 8.1
>
> I put the clear_blocks_c function, in a file and run
> clang -S -O1 test_asm_gen.c
>
> the asm result is
> .section __TEXT,__text,regular,pure_instructions
> .macosx_version_min 10, 12
> .globl _clear_blocks_c
> .p2align 4, 0x90
> _clear_blocks_c: ## @clear_blocks_c
> .cfi_startproc
> ## BB#0:
> pushq %rbp
> Ltmp0:
> .cfi_def_cfa_offset 16
> Ltmp1:
> .cfi_offset %rbp, -16
> movq %rsp, %rbp
> Ltmp2:
> .cfi_def_cfa_register %rbp
> movl $768, %esi ## imm = 0x300
> callq ___bzero
> popq %rbp
> retq
> .cfi_endproc
>
>
> .subsections_via_symbols
>
> Seems like an optimized function is call for clear_blocks_c
Yeah, the c version uses memset. Guess clang's implementation is good.
Patch pushed. Thanks.
More information about the ffmpeg-devel
mailing list