[FFmpeg-devel] [libavutil/timestamp.h: Fix loss of precision in timestamps for silencedetect on long files]

Tue Mar 12 01:46:00 EET 2024

> On Monday, March 11, 2024 at 12:50:11 PM PDT, <epirat07 at gmail.com> wrote:
> On 11 Mar 2024, at 15:26, Andreas Rheinhardt wrote:
>> Andreas Rheinhardt:
>>> Allan Cady via ffmpeg-devel:
>>>> From: "Allan Cady" <allancady at yahoo.com>
>>>>
>>>> I propose changing the format to "%.6f", which will
>>>> give microsecond precision for all timestamps, regardless of
>>>> offset. Trailing zeros can be trimmed from the fraction, without
>>>> losing precision. If the length of the fixed-precision formatted
>>>> timestamp exceeds the length of the allocated buffer
>>>> (AV_TS_MAX_STRING_SIZE, currently 32, less one for the
>>>> terminating null), then we can fall back to scientific notation, though
>>>> this seems almost certain to never occur, because 32 characters would
>>>> allow a maximum timestamp value of (32 - 1 - 6 - 1) = 24 characters.
>>>> By my calculation, 10^24 seconds is about six orders of magnitude
>>>> greater than the age of the universe.
>>>>
>>>> The fix proposed here follows the following logic:
>>>>
>>>> 1) Try formatting the number of seconds using "%.6f". This will
>>>> always result in a string with six decimal digits in the fraction,
>>>> possibly including trailing zeros. (e.g. "897234.73200").
>>>>
>>>> 2) Check if that string would overflow the buffer. If it would, then
>>>> format it using scientific notation ("%.8g").
>>>>
>>>> 3) If the original fixed-point format fits, then trim any trailing
>>>> zeros and decimal point, and return that result.
>>>>
>>>> Making this change broke two fate tests, filter-metadata-scdet,
>>>> and filter-metadata-silencedetect. To correct this, I've modified
>>>> tests/ref/fate/filter-metadata-scdet and
>>>> tests/ref/fate/filter-metadata-silencedetect to match the
>>>> new output.
>>>>
>>>> Signed-off-by: Allan Cady <allancady at yahoo.com>
>>>> ---
>>>>  libavutil/timestamp.h                        | 53 +++++++++++++++++++-
>>>>  tests/ref/fate/filter-metadata-scdet        | 12 ++---
>>>>  tests/ref/fate/filter-metadata-silencedetect |  2 +-
>>>>  3 files changed, 58 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/libavutil/timestamp.h b/libavutil/timestamp.h
>>>> index 2b37781eba..2f04f9bb2b 100644
>>>> --- a/libavutil/timestamp.h
>>>> +++ b/libavutil/timestamp.h
>>>> @@ -25,6 +25,7 @@
>>>>  #define AVUTIL_TIMESTAMP_H
>>>>
>>>>  #include "avutil.h"
>>>> +#include <stdbool.h>
>>>>
>>>>  #if defined(__cplusplus) && !defined(__STDC_FORMAT_MACROS) && !defined(PRId64)
>>>>  #error missing -D__STDC_FORMAT_MACROS / #define __STDC_FORMAT_MACROS
>>>> @@ -53,6 +54,32 @@ static inline char *av_ts_make_string(char *buf, int64_t ts)
>>>>  */
>>>>  #define av_ts2str(ts) av_ts_make_string((char[AV_TS_MAX_STRING_SIZE]){0}, ts)
>>>>
>>>> +/**
>>>> + * Strip trailing zeros and decimal point from a string. Performed
>>>> + * in-place on input buffer. For local use only by av_ts_make_time_string.
>>>> + *
>>>> + * e.g.:
>>>> + * "752.378000" -> "752.378"
>>>> + *        "4.0" -> "4"
>>>> + *      "97300" -> "97300"
>>>> + */
>>>> +static inline void av_ts_strip_trailing_zeros_and_decimal_point(char *str) {
>>>> +    if (strchr(str, '.'))
>>>> +    {
>>>> +        int i;
>>>> +        for (i = strlen(str) - 1; i >= 0 && str[i] == '0'; i--) {
>>>> +            str[i] = '\0';
>>>> +        }
>>>> +
>>>> +        // Remove decimal point if it's the last character
>>>> +        if (i >= 0 && str[i] == '.') {
>>>> +            str[i] = '\0';
>>>> +        }
>>>> +
>>>> +        // String was modified in place; no need for return value.
>>>> +    }
>>>> +}
>>>> +
>>>>  /**
>>>>  * Fill the provided buffer with a string containing a timestamp time
>>>>  * representation.
>>>> @@ -65,8 +92,30 @@ static inline char *av_ts_make_string(char *buf, int64_t ts)
>>>>  static inline char *av_ts_make_time_string(char *buf, int64_t ts,
>>>>                                            const AVRational *tb)
>>>>  {
>>>> -    if (ts == AV_NOPTS_VALUE) snprintf(buf, AV_TS_MAX_STRING_SIZE, "NOPTS");
>>>> -    else                      snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.6g", av_q2d(*tb) * ts);
>>>> +    if (ts == AV_NOPTS_VALUE)
>>>> +    {
>>>> +        snprintf(buf, AV_TS_MAX_STRING_SIZE, "NOPTS");
>>>> +    }
>>>> +    else
>>>> +    {
>>>> +        const int max_fraction_digits = 6;
>>>> +
>>>> +        // Convert 64-bit timestamp to double, using rational timebase
>>>> +        double seconds = av_q2d(*tb) * ts;
>>>> +
>>>> +        int length = snprintf(NULL, 0, "%.*f", max_fraction_digits, seconds);
>>>> +        if (length <= AV_TS_MAX_STRING_SIZE - 1)
>>>> +        {
>>>> +            snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.*f", max_fraction_digits, seconds);
>>>> +            av_ts_strip_trailing_zeros_and_decimal_point(buf);
>>>> +        }
>>>> +        else
>>>> +        {
>>>> +            snprintf(buf, AV_TS_MAX_STRING_SIZE, "%.8g", seconds);
>>>> +        }
>>>> +
>>>> +    }
>>>> +
>>>>      return buf;
>>>>  }
>>>>
>>>

>>> 1. What makes you believe that all users want the new format and that it
>>> does not cause undesired behaviour for some (maybe a lot) of them?

I definitely do not know what other users would want. I would think
maybe some would like the change, others wouldn't, and most would
never know.

>>> The
>>> number of characters written by the earlier code stayed pretty constant
>>> even when the times became big (in this case, it just printed 8 chars if
>>> ts>=0), yet your code will really make use of the whole buffer.

It's true that my change will increase the potential length of
the output beyond 8 significant digits.

The issue I was having that brought this up was, I have some very long
audio files (up to 50 hours long), which I was wanting to split
into smaller pieces. I wrote some scripts that use silencedetect to get
the locations of breaks and then split the files at the breaks, but I
discovered that for segments near the end of the file, silencedetect was
returning whole-number timestamps, which was causing undesirable 
results for me. Thinking functionally, it seems like timestamps further
out in a file ought to have the same precision as those near the
beginning. So this seems to me like a minor oversight in the original
design, that might warrant fixing.

>>> Granted, we could tell our users that they have no right to complain
>>> about this, given that we always had a "right" to use the full buffer,
>>> but I consider this a violation of the principle of least surprise. 

I definitely agree with you there.

>>> Why don't you just change silencedetect or add another function?

I actually started out taking that approach in my submission a few weeks
ago. Marton Balint suggested (in a message on 20 Feb) that we make the
change in av_ts_make_time_string, so I did that for this submission.

I'm open to whatever approach you all consider is best.

>>> 2. For very small timestamps (< 10^-4), the new code will print a lot of
>>> useless leading zeros (after the decimal point). In fact, it can be so
>>> many that the new code has less precision than the old code, despite
>>> using the fill buffer.

I don't understand. Leading zeros after the decimal point are far from
useless -- they are part of the value. Maybe what you're saying is that
six digits is more precision than necessary? That may be so. I could
personally do fine with just two digits (hundredths), as long as it's
consistent through the length of the file. 

>>> 2. This is way too much code for an inline function.

No disagreement there.

>>> 3. Anyway, your placement of {} on their own lines does not match the
>>> project coding style.

I'm happy to conform with project coding style.

>> In addition to this, there is another issue here: Your
>> av_ts_strip_trailing_zeros_and_decimal_point() presumes that the
>> "decimal-point character" is always '.', but this can be changed via
>> setlocale().

Excellent point, which I hadn't considered. I have no experience with
how locale is handled in C. I would welcome advice on the best way to
handle this.

> True, though I would consider this a more general bug. We should be
> consistent and not generate files that are locale-dependent and then
> not parseable anymore with a different one… That’s just a huge mess.
> 
> Also in general FFmpeg is completely broken if you use any locale that
> does not use . as decimal separator. (This never shows for most users
> currently as most people use FFmpeg CLI which does not respect the
> users locale)

I'll leave that conversation to the experts here.

Thanks for giving my code a look.