[FFmpeg-devel] [PATCH] Stop using _explicit atomic operations where not necessary.

Sat Jun 12 19:04:31 EEST 2021

Anton Khirnov:
> Quoting Andreas Rheinhardt (2021-06-06 19:20:38)
>> Anton Khirnov:
>>> Quoting Andreas Rheinhardt (2021-06-06 11:13:00)
>>>> Anton Khirnov:
>>>>> Memory ordering constraints other than the default (sequentially
>>>>> consistent) can behave in very unintuitive and unexpected ways, and so
>>>>> should be avoided unless there is a strong (typically performance)
>>>>> reason for using them. This holds especially for memory_order_relaxed,
>>>>> which imposes no ordering constraints.
>>>>
>>>> I disagree with this: Adding ordering constraints is wrong, as it can
>>>> provide synchronization between different threads and this can mask
>>>> races. Providing synchronization as a byproduct should be avoided: Users
>>>> should know what they get. And if a user wants an allocation limit in
>>>> thread A to be in effect for allocations in thread B, then the user
>>>> should provide the synchronization him/herself.
>>>
>>> I do not agree. C11 default is sequential consistency, therefore our
>>> public API should provide sequential consistency.
>>>
>>
>> 1. The atomics here are not part of the public API at all: The
>> documentation of none of the functions involved mentions atomics or
>> synchronization etc. at all (which means that if a user wants
>> synchronization between threads, he should take care of it himself). We
>> could change the implementation to not use atomics at all (e.g. by
>> wrapping everything in mutexes, which needn't be sequentially consistent).
>> (Furthermore, in case of libavformat/fifo.c this does not even come
>> close to affecting the public API, given that we are talking about
>> synchronization between the user thread and a thread owned by us; in
>> case of av_cpu_count() the atomics only exist because of an av_log(),
>> there is no reason for a user to believe that synchronization is even
>> involved.)
>> 2. I don't buy the argument:
>> Which memory ordering to use should be decided based upon what one wants
>> and needs; one should not blindly use one because it is the default or
>> recommended by some authority (doing so smells like cargo cult).
> 
> That is an overly idealistic view of the situation. I think very few
> people around here have an indepth understanding of concurrency issues,
> e.g. of the difference between using memory_order_seq_cst and
> memory_order_relaxed. This stuff is complicated and unintuitive,
> according to pretty much everyone who ever dealt with it.
> 
> And you cannot expect that everyone will study this topic in detail just
> because they need to touch an atomic operation. So it seems perfectly
> reasonable to me to defer to a suitable authority (like the people who
> designed c11 atomics) and use their recommended defaults unless there is
> a strong reason to do otherwise.
> 
>> In case
>> of av_get_cpu_flags()+av_force_cpu_flags(), all one wants is that one
>> can just load and write the value; given that there is no restriction
>> loads a valid value (i.e. no torn reads/writes) and one wants that one
>> can access this function by multiple threads simultaneously without
>> races. Atomics provide this even with memory_order_relaxed. In case of
>> av_cpu_count() all that is desired is that av_log() is only called once;
>> an atomic_exchange provides this automatically, even with
>> memory_order_relaxed. The av_max_alloc()+av_malloc/... combination is
>> just like av_get/force_cpu_flags().
>>
>>
>>> My other concern is that people are cargo culting this code around
>>> without understanding what it really does. Again, sequentially
>>> consistent atomics make that safer, since they do what you'd expect them
>>> to.
>>>
>>
>> Are you really advocating that we write our code in such a way that
>> makes it safe for cargo-culters to copy it?
> 
> Yes I am.
> 
> People copy code around, that's just a fact of life. Two separate people
> told me recently that their patches used memory_order_relaxed only
> because they based them off some other code that was already using it.
> 
> So IMO we should strive to write code that is not only safe and correct,
> but also robust against future modifications and being copied around.
> That includes not using potentially-unsafe operations without a good
> reason.

IMO I gave a good reason for the cases considered: None of these
libavutil functions are intended to provide synchronization; they are
actually incapable of doing so and trying to do so makes no sense and
may mask bugs.

> 
> There is also the question of code readability. When I see
> atomic_load(&foo);
> then it's just an atomic load. But when I see
> atomic_load_explicit(&foo, memory_order_bar);
> I also need to think about the reason the author chose to use that
> specific memory order.

Strange, it's the other way around for me: If I see a
atomic_load/store(), I immediately ask myself: "Is the global total
order of modifications really necessary/intended? Does this code work if
one uses any of our compat/atomics that translates an atomic_load to
something weaker?"

> 
> As for your example program, you could also just remove the av_log()
> from av_cpu_count(). Which should be done anyway because av_log(NULL is
> evil.
> 
Another sample program where a race condition can be masked by your
changes; it has nothing to do with av_log(NULL:

#include <pthread.h>
#include <stdio.h>
#include "libavutil/cpu.h"
#include "libavutil/mem.h"

int race;

static void *second_thread(void *arg)
{
    race = 1;
    printf("Second thread CPU flags %x\n", av_get_cpu_flags());
    return NULL;
}

int main()
{
    pthread_t thread;

    pthread_create(&thread, NULL, second_thread, NULL);

    /* Do some work/waste some time */
    for (int i = 0; i < 200; i++)
        av_malloc(i);

    int flags = av_get_cpu_flags();
    printf("%d, flags %x\n", race, flags);
    pthread_join(thread, NULL);
}

(Orthogonally to the discussion about memory_order: The implementation
of av_get_cpu_flags() allows to overwrite cpu_flags that have already
been set, even when the earlier cpu_flags have been set by
av_force_cpu_flags(); in a scenario in which thread A calls
av_get_cpu_flags() and thread B calls av_force_cpu_flags() and later
av_get_cpu_flags(), then thread B might have the legitimate expectation
that the cpu flags it has forced are in effect for its later
av_get_cpu_flags() call if there is no second call to
av_force_cpu_flags(); yet this needn't be so, namely if the write of
thread B happens between the read and the write in thread A. Maybe we
should use atomic_compare_exchange_strong(_explicit) and only overwrite
cpu_flags if it is still -1?)

- Andreas