[FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

Rémi Denis-Courmont remi at remlab.net
Fri Dec 22 08:20:21 EET 2023



Le 21 décembre 2023 22:16:09 GMT+02:00, "Rémi Denis-Courmont" <remi at remlab.net> a écrit :
>Le tiistaina 19. joulukuuta 2023, 14.02.00 EET Martin Storsjö a écrit :
>> This replaces the riscv specific handling from
>> 7212466e735aa187d82f51dadbce957fe3da77f0 (which essentially is
>> reverted, together with 286d6742218ba0235c32876b50bf593cb1986353)
>> with a different implementation of the same (plus a bit more), based
>> on the corresponding feature in dav1d's checkasm, supporting both Unix
>> and Windows.
>> 
>> See in particular dav1d commits
>> 0b6ee30eab2400e4f85b735ad29a68a842c34e21 and
>> 0421f787ea592fd2cc74c887f20b8dc31393788b, authored by
>> Henrik Gramner.
>> 
>> The overall approach is the same; set up a signal handler,
>> store the state with setjmp/sigsetjmp, jump out of the crashing
>> function with longjmp/siglongjmp.
>> 
>> The main difference is in what happens when the signal handler
>> is invoked. In the previous implementation, it would resume from
>> right before calling the crashing function, and then skip that call
>> based on the setjmp return value.
>> 
>> In the imported implementation from dav1d, we return to right before
>> the check_func() call, which will skip testing the current function
>> (as the pointer is the same as it was before).
>> 
>> Other differences are:
>> - Support for other signal handling mechanisms (Windows
>>   AddVectoredExceptionHandler)
>> - Using RtlCaptureContext/RtlRestoreContext instead of setjmp/longjmp
>>   on Windows with SEH (which adds the design limitation that it doesn't
>>   return a value like setjmp does)
>> - Only catching signals once per function - if more than one
>>   signal is delivered before signal handling is reenabled, any
>>   signal is handled as it would without our handler
>> - Not using an arch specific signal handler written in assembly
>> ---
>>  tests/checkasm/checkasm.c       | 100 ++++++++++++++++++++++++++------
>>  tests/checkasm/checkasm.h       |  79 ++++++++++++++++++-------
>>  tests/checkasm/riscv/checkasm.S |  12 ----
>>  3 files changed, 140 insertions(+), 51 deletions(-)
>> 
>> diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
>> index 6318d9296b..668034c67f 100644
>> --- a/tests/checkasm/checkasm.c
>> +++ b/tests/checkasm/checkasm.c
>> @@ -23,8 +23,10 @@
>>  #include "config.h"
>>  #include "config_components.h"
>> 
>> -#ifndef _GNU_SOURCE
>> -# define _GNU_SOURCE // for syscall (performance monitoring API),
>> strsignal() +#if CONFIG_LINUX_PERF
>> +# ifndef _GNU_SOURCE
>> +#  define _GNU_SOURCE // for syscall (performance monitoring API)
>> +# endif
>>  #endif
>> 
>>  #include <signal.h>
>> @@ -326,6 +328,7 @@ static struct {
>>      const char *cpu_flag_name;
>>      const char *test_name;
>>      int verbose;
>> +    int catch_signals;
>
>AFAICT, this needs to be volatile sigatomic_t
>
>>  } state;
>> 
>>  /* PRNG state */
>> @@ -627,6 +630,64 @@ static CheckasmFunc *get_func(CheckasmFunc **root,
>> const char *name) return f;
>>  }
>> 
>> +checkasm_context checkasm_context_buf;
>> +
>> +/* Crash handling: attempt to catch crashes and handle them
>> + * gracefully instead of just aborting abruptly. */
>> +#ifdef _WIN32
>> +#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
>> +static LONG NTAPI signal_handler(EXCEPTION_POINTERS *const e) {
>> +    const char *err;
>> +
>> +    if (!state.catch_signals)
>> +        return EXCEPTION_CONTINUE_SEARCH;
>> +
>> +    switch (e->ExceptionRecord->ExceptionCode) {
>> +    case EXCEPTION_FLT_DIVIDE_BY_ZERO:
>> +    case EXCEPTION_INT_DIVIDE_BY_ZERO:
>> +        err = "fatal arithmetic error";
>> +        break;
>> +    case EXCEPTION_ILLEGAL_INSTRUCTION:
>> +    case EXCEPTION_PRIV_INSTRUCTION:
>> +        err = "illegal instruction";
>> +        break;
>> +    case EXCEPTION_ACCESS_VIOLATION:
>> +    case EXCEPTION_ARRAY_BOUNDS_EXCEEDED:
>> +    case EXCEPTION_DATATYPE_MISALIGNMENT:
>> +    case EXCEPTION_STACK_OVERFLOW:
>> +        err = "segmentation fault";
>> +        break;
>> +    case EXCEPTION_IN_PAGE_ERROR:
>> +        err = "bus error";
>> +        break;
>> +    default:
>> +        return EXCEPTION_CONTINUE_SEARCH;
>> +    }
>> +    state.catch_signals = 0;
>> +    checkasm_fail_func("%s", err);
>> +    checkasm_load_context();
>> +    return EXCEPTION_CONTINUE_EXECUTION; /* never reached, but shuts up gcc
>> */ +}
>> +#endif
>> +#else
>> +static void signal_handler(const int s) {
>> +    if (state.catch_signals) {
>> +        state.catch_signals = 0;
>> +        checkasm_fail_func("%s",
>> +                           s == SIGFPE ? "fatal arithmetic error" :
>> +                           s == SIGILL ? "illegal instruction" :
>> +                           s == SIGBUS ? "bus error" :
>> +                                         "segmentation fault");
>
>The current code for the error print-out is both simpler and more versatile, 
>so I don't get this.
>
>> +        checkasm_load_context();
>> +    } else {
>> +        /* fall back to the default signal handler */
>> +        static const struct sigaction default_sa = { .sa_handler = SIG_DFL
>> }; +        sigaction(s, &default_sa, NULL);
>> +        raise(s);
>
>Why raise here? Returning from the handler will reevaluate the same code with 
>the same thread state, and trigger the default signal handler anyway (since 
>you don't modify the user context).
>
>> +    }
>> +}
>> +#endif
>> +
>>  /* Perform tests and benchmarks for the specified cpu flag if supported by
>> the host */ static void check_cpu_flag(const char *name, int flag)
>>  {
>> @@ -737,18 +798,24 @@ int main(int argc, char *argv[])
>>      unsigned int seed = av_get_random_seed();
>>      int i, ret = 0;
>> 
>> +#ifdef _WIN32
>> +#if WINAPI_FAMILY_PARTITION(WINAPI_PARTITION_DESKTOP)
>> +    AddVectoredExceptionHandler(0, signal_handler);
>> +#endif
>> +#else
>> +    const struct sigaction sa = {
>> +        .sa_handler = signal_handler,
>> +        .sa_flags = SA_NODEFER,
>
>That does not look very sane to me. If a recursive signal occurs, processing 
>it recursively is NOT a good idea.

Following that, it actually seems safer to automatically reset the handler, using `signal()` or equivalently passing the `SA_RESETHAND` flag. Then the handler can rearm its own self if and *only* if it was able to actually handle the signal by observing a long jump. Resetting to default explicitly is no longer useful then.



More information about the ffmpeg-devel mailing list