[FFmpeg-devel] [PATCH] x86: use new gcc atomic built-ins if available
Michael Niedermayer
michaelni at gmx.at
Mon Oct 27 20:33:10 CET 2014
On Sat, Oct 25, 2014 at 10:32:57PM -0300, James Almer wrote:
> __sync built-ins are considered legacy and will be deprecated.
> These new memory model aware built-ins have been available since GCC 4.7.0
>
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/_005f_005fatomic-Builtins.html
> This is an RFC for a couple reasons.
>
> The first is the memory model parameter. The documentation mentions that the
> __sync functions match the behavoir of the new __atomic functions when the
> latter use the full barrier model (__ATOMIC_SEQ_CST), so i went with it for
> consistency's sake. It may however be a good idea to check if any of the more
> relaxed models available for these new functions can be used instead.
> It's worth mentioning that when i tested, gcc-tsan liked the __atomic load and
> store functions a lot more than __sync_synchronize(), regardless of memory
> model.
>
> The second reason is __atomic_compare_exchange_n(), and how it differs from
> __sync_val_compare_and_swap().
> While the latter returns *ptr as it was before the operation, the former
> doesn't and instead copies *ptr to oldval if the result of the comparison is
> false. This means that returning oldval will match the old behavoir without
> having to change the wrapper.
> A disassemble example from libavutil/buffer.o however hints that the __atomic
> function may be slower because of it writting oldval.
>
> __sync_val_compare_and_swap:
> 8e3: 48 89 d8 mov rax,rbx
> 8e6: f0 48 0f b1 16 lock cmpxchg QWORD PTR [rsi],rdx
> 8eb: 48 85 c0 test rax,rax
>
> __atomic_compare_exchange_n:
> 8f0: 48 8d 4c 24 20 lea rcx,[rsp+0x20]
> [...]
> 90c: 48 89 d8 mov rax,rbx
> 90f: 48 89 5c 24 20 mov QWORD PTR [rsp+0x20],rbx
> 914: f0 48 0f b1 16 lock cmpxchg QWORD PTR [rsi],rdx
> 919: 74 03 je 91e <av_buffer_pool_get+0x3e>
> 91b: 48 89 01 mov QWORD PTR [rcx],rax
> 91e: 48 8b 44 24 20 mov rax,QWORD PTR [rsp+0x20]
> 923: 48 85 c0 test rax,rax
>
> So the question is, do we keep using __sync_val_compare_and_swap as long as
> gcc offers it (Which is probably a very long time), or immediately switch to
> __atomic_compare_exchange_n if available?
id say we should favor whatever is faster
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The real ebay dictionary, page 2
"100% positive feedback" - "All either got their money back or didnt complain"
"Best seller ever, very honest" - "Seller refunded buyer after failed scam"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20141027/06ec4329/attachment.asc>
More information about the ffmpeg-devel
mailing list