[FFmpeg-devel] [PATCH] avutil: add av_memcpy() to avoid undefined behavior with NULL, NULL, 0

Sun Jul 7 00:22:59 EEST 2019

lör 2019-07-06 klockan 18:34 +0200 skrev Michael Niedermayer:
> On Sat, Jul 06, 2019 at 02:34:34PM +0200, Tomas Härdin wrote:
> > lör 2019-07-06 klockan 00:08 +0200 skrev Michael Niedermayer:
> > > As we are already off topic, heres an example to test static
> > > analysis, does this trigger undefined behavior by executing the
> > > memcpy
> > > for some user input ?
> > > 
> > > void f(unsigned bigint a) {
> > >     bigint i;
> > >     for (i = 2; (((bigint)1 << a) + 1) % i; i++)
> > >         ;
> > >     if (a > 20 && i > ((bigint)1 << a))
> > >         memcpy(NULL, NULL, 0);
> > > }
> > > 
> > > i know its a lame example but just to remind that static analysis
> > > has
> > > limitations. (your mail sounds a bit like static analysis could
> > > replace
> > > everything ...)
> > 
> > That is acually perfectly legal since the intersection between
> > [NULL,NULL) and [NULL,NULL) is empty and thus they do not overlap.
> > Here's an example that validates:
> > 
> > #include <stdint.h>
> > #include <string.h>
> > 
> > /*@ axiomatic Foo {
> >     axiom Bar: \forall integer a;
> >       0 <= a <= 62 ==>
> >         1 <= (1<<a) <= (1<<62);
> >     }
> >  */
> > 
> > /*@ requires 0 <= a <= 62;
> >     assigns ((char*)NULL)[0..-1];
> >  */
> > void f(uint64_t a) {
> >     int64_t i = 2;
> >     int64_t a2 = (1LL << a) + 1;
> >     /*@ loop invariant 2 <= i <= a2;
> >         loop assigns i;
> >      */
> >     for (; a2 % i; i++)
> >         ;
> >     //@ assert a2 % i == 0;
> >     if (a > 20 && i > ((int64_t)1 << a))
> >         memcpy(NULL, NULL, 0);
> > }
> 
> this code is wrong.
> Imagine this was real code, and to make it fit in the static
> analyzer one changes it like this
> 
> why is it worng ?
> the range should not have a upper bound of 62 in fact there is no
> reason to run this function with input below 1LL<<33 that is not
> 33 that is 1LL<<33

All bignum implementations I've seen have upper bounds. Maybe you have
an infinite tape laying around somewhere?

It is possible to define custom data types and reason about them just
as well as integers. In fact, I've seen elliptic curve implementations
that do just that. It's just that it's a bit of work

> 
> also you can restrict a to powers of 2 if you like
> thats
> for (b = 33; ; b++)
>     f((bigint)1<<b);
>     
> can the analyzer help check this ?

If you have a bigint_shift() function with an appropriate contract,
sure. b will overflow at some point howeveer, which the prover will
catch by default if b is signed. It can also be told to catch unsigned
overflow.

> I am pretty sure its not UB below, and i suspect its not UB for most
> values 
> above but iam really curious and like to know for sure.
> 
> 
> [...]
> 
> > If you change the memcpy() to copy n=1 bytes then the verification
> > fails, as expected. It also fails if let src and dst equal to the
> > same
> > valid memory area, because of overlap.
> 
> are you saying that code which never executes causes failure of the
> analyzer because if it did execute it would be wrong ?

This is equivalent to asking the prover to show that there are no
Fermat primes above (1<<20), which is a tall order. If your code
depends on tricky-to-prove theorems then obviously the prover is not
going to do that job for you.

Whether dead code is removed is a good question nonetheless, let me
check:

  void f(uint64_t a) {
    [...]
    if (a > 20 && i > ((int64_t)1 << a)) {
        char aa[1];
        memcpy(aa, aa, 1);
    }
  }

fails

  void f(uint64_t a) {
    [...]
    if (a > 62 && i > ((int64_t)1 << a)) {
        char aa[1];
        memcpy(aa, aa, 1);
    }
  }

succeeds (because 0 <= a <= 62)

/Tomas