[FFmpeg-devel] [PATCH] Stop using _explicit atomic operations where not necessary.

Sun Jun 6 20:20:38 EEST 2021

Anton Khirnov:
> Quoting Andreas Rheinhardt (2021-06-06 11:13:00)
>> Anton Khirnov:
>>> Memory ordering constraints other than the default (sequentially
>>> consistent) can behave in very unintuitive and unexpected ways, and so
>>> should be avoided unless there is a strong (typically performance)
>>> reason for using them. This holds especially for memory_order_relaxed,
>>> which imposes no ordering constraints.
>>
>> I disagree with this: Adding ordering constraints is wrong, as it can
>> provide synchronization between different threads and this can mask
>> races. Providing synchronization as a byproduct should be avoided: Users
>> should know what they get. And if a user wants an allocation limit in
>> thread A to be in effect for allocations in thread B, then the user
>> should provide the synchronization him/herself.
> 
> I do not agree. C11 default is sequential consistency, therefore our
> public API should provide sequential consistency.
> 

1. The atomics here are not part of the public API at all: The
documentation of none of the functions involved mentions atomics or
synchronization etc. at all (which means that if a user wants
synchronization between threads, he should take care of it himself). We
could change the implementation to not use atomics at all (e.g. by
wrapping everything in mutexes, which needn't be sequentially consistent).
(Furthermore, in case of libavformat/fifo.c this does not even come
close to affecting the public API, given that we are talking about
synchronization between the user thread and a thread owned by us; in
case of av_cpu_count() the atomics only exist because of an av_log(),
there is no reason for a user to believe that synchronization is even
involved.)
2. I don't buy the argument:
Which memory ordering to use should be decided based upon what one wants
and needs; one should not blindly use one because it is the default or
recommended by some authority (doing so smells like cargo cult). In case
of av_get_cpu_flags()+av_force_cpu_flags(), all one wants is that one
can just load and write the value; given that there is no restriction
loads a valid value (i.e. no torn reads/writes) and one wants that one
can access this function by multiple threads simultaneously without
races. Atomics provide this even with memory_order_relaxed. In case of
av_cpu_count() all that is desired is that av_log() is only called once;
an atomic_exchange provides this automatically, even with
memory_order_relaxed. The av_max_alloc()+av_malloc/... combination is
just like av_get/force_cpu_flags().

> My other concern is that people are cargo culting this code around
> without understanding what it really does. Again, sequentially
> consistent atomics make that safer, since they do what you'd expect them
> to.
> 

Are you really advocating that we write our code in such a way that
makes it safe for cargo-culters to copy it?

>>
>> Here is an example of a program that contains a data race that is masked
>> by this patch: If all allocations happen before av_max_alloc(3), then
>> tsan will complain about a data race; otherwise, it is silent, as the
>> write in av_max_alloc() synchronizes with the read in av_malloc(), so
>> that race = 1 is visible in the main thread later.
> 
> This example is highly artificial and IMO not relevant, since
> - nothing in the libraries should call av_max_malloc() or other
>   functions that modify global state

Here is an example with a function that we actually use:

#include <pthread.h>
#include <stdio.h>
#include "libavutil/cpu.h"
#include "libavutil/mem.h"

int race;

static void *second_thread(void *arg)
{
    race = 1;
    printf("Second thread CPU count %d\n", av_cpu_count());
    return NULL;
}

int main()
{
    pthread_t thread;

    pthread_create(&thread, NULL, second_thread, NULL);

    /* Do some work/waste some time */
    for (int i = 0; i < 150; i++)
        av_malloc(i);

    int count = av_cpu_count();
    printf("%d, count %d\n", race, count);
    pthread_join(thread, NULL);
}

(The difference between memory_order_seq_cst and using pairs of
memory_order_release-memory_order_acquire (or memory_order_acq_rel) is
that there is a single (i.e. global) total order for all
memory_order_seq_cst operations; so using these is a bit like a
modification of global state.)

> - it is not our job to look for races in user programs
> 
True, but making it more difficult for users to find races is not nice.

- Andreas