[FFmpeg-devel] [PATCHv4] avutil/common: add av_rint64_clip
Ganesh Ajjanagadde
gajjanag at mit.edu
Sat Nov 14 22:37:22 CET 2015
On Sat, Nov 14, 2015 at 4:35 PM, James Almer <jamrial at gmail.com> wrote:
> On 11/14/2015 6:30 PM, Hendrik Leppkes wrote:
>> On Sat, Nov 14, 2015 at 10:27 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>>> On Sat, Nov 14, 2015 at 4:03 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>>>> On Sat, Nov 14, 2015 at 3:28 PM, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
>>>>> On Sat, Nov 14, 2015 at 3:51 AM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>>>>>> On Fri, Nov 13, 2015 at 7:17 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Fri, Nov 13, 2015 at 6:16 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Fri, Nov 13, 2015 at 4:52 PM, Ronald S. Bultje <rsbultje at gmail.com>
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On Fri, Nov 13, 2015 at 4:28 PM, Ganesh Ajjanagadde <gajjanag at mit.edu>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Nov 13, 2015 at 1:28 PM, Ronald S. Bultje <rsbultje at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 13, 2015 at 12:17 PM, Ganesh Ajjanagadde <
>>>>>>>> gajjanag at mit.edu>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 13, 2015 at 12:10 PM, Ronald S. Bultje <
>>>>>>>> rsbultje at gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Ganesh,
>>>>>>>>>>>>> On Nov 13, 2015 12:02 PM, "Ganesh Ajjanagadde" <
>>>>>>>>>> gajjanagadde at gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The rationale for this function is reflected in the documentation
>>>>>>>> for
>>>>>>>>>>>>>> it, and is copied here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Clip a double value into the long long amin-amax range.
>>>>>>>>>>>>>> This function is needed because conversion of floating point to
>>>>>>>>>> integers
>>>>>>>>>>>>> when
>>>>>>>>>>>>>> it does not fit in the integer's representation does not
>>>>>>>> necessarily
>>>>>>>>>>>>> saturate
>>>>>>>>>>>>>> correctly (usually converted to a cvttsd2si on x86) which
>>>>>>>> saturates
>>>>>>>>>>>>> numbers
>>>>>>>>>>>>>>> INT64_MAX to INT64_MIN. The standard marks such conversions as
>>>>>>>>>>>> undefined
>>>>>>>>>>>>>> behavior, allowing this sort of mathematically bogus conversions.
>>>>>>>>>> This
>>>>>>>>>>>>> provides
>>>>>>>>>>>>>> a safe alternative that is slower obviously but assures safety and
>>>>>>>>>>>> better
>>>>>>>>>>>>>> mathematical behavior.
>>>>>>>>>>>>>> API:
>>>>>>>>>>>>>> @param a value to clip
>>>>>>>>>>>>>> @param amin minimum value of the clip range
>>>>>>>>>>>>>> @param amax maximum value of the clip range
>>>>>>>>>>>>>> @return clipped value
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Note that a priori if one can guarantee from the calling side that
>>>>>>>>>> the
>>>>>>>>>>>>>> double is in range, it is safe to simply do an explicit/implicit
>>>>>>>>>> cast,
>>>>>>>>>>>>>> and that will be far faster. However, otherwise this function
>>>>>>>> should
>>>>>>>>>> be
>>>>>>>>>>>>>> used.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> avutil minor version is bumped.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Reviewed-by: Ronald S. Bultje <rsbultje at gmail.com>
>>>>>>>>>>>>>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> libavutil/common.h | 30 ++++++++++++++++++++++++++++++
>>>>>>>>>>>>>> libavutil/version.h | 2 +-
>>>>>>>>>>>>>> 2 files changed, 31 insertions(+), 1 deletion(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/libavutil/common.h b/libavutil/common.h
>>>>>>>>>>>>>> index 6f0f582..f4687ab 100644
>>>>>>>>>>>>>> --- a/libavutil/common.h
>>>>>>>>>>>>>> +++ b/libavutil/common.h
>>>>>>>>>>>>>> @@ -298,6 +298,33 @@ static av_always_inline av_const double
>>>>>>>>>>>>> av_clipd_c(double a, double amin, double
>>>>>>>>>>>>>> else return a;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> +/**
>>>>>>>>>>>>>> + * Clip and convert a double value into the long long amin-amax
>>>>>>>>>> range.
>>>>>>>>>>>>>> + * This function is needed because conversion of floating point
>>>>>>>> to
>>>>>>>>>>>>> integers when
>>>>>>>>>>>>>> + * it does not fit in the integer's representation does not
>>>>>>>>>> necessarily
>>>>>>>>>>>>> saturate
>>>>>>>>>>>>>> + * correctly (usually converted to a cvttsd2si on x86) which
>>>>>>>>>> saturates
>>>>>>>>>>>>> numbers
>>>>>>>>>>>>>> + * > INT64_MAX to INT64_MIN. The standard marks such conversions
>>>>>>>> as
>>>>>>>>>>>>> undefined
>>>>>>>>>>>>>> + * behavior, allowing this sort of mathematically bogus
>>>>>>>> conversions.
>>>>>>>>>>>>> This provides
>>>>>>>>>>>>>> + * a safe alternative that is slower obviously but assures safety
>>>>>>>>>> and
>>>>>>>>>>>>> better
>>>>>>>>>>>>>> + * mathematical behavior.
>>>>>>>>>>>>>> + * @param a value to clip
>>>>>>>>>>>>>> + * @param amin minimum value of the clip range
>>>>>>>>>>>>>> + * @param amax maximum value of the clip range
>>>>>>>>>>>>>> + * @return clipped value
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +static av_always_inline av_const int64_t av_rint64_clip_c(double
>>>>>>>> a,
>>>>>>>>>>>>> int64_t amin, int64_t amax)
>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>> +#if defined(HAVE_AV_CONFIG_H) && defined(ASSERT_LEVEL) &&
>>>>>>>>>> ASSERT_LEVEL
>>>>>>>>>>>>>> = 2
>>>>>>>>>>>>>> + if (amin > amax) abort();
>>>>>>>>>>>>>> +#endif
>>>>>>>>>>>>>> + // INT64_MAX+1,INT64_MIN are exactly representable as IEEE
>>>>>>>>>> doubles
>>>>>>>>>>>>>> + if (a >= 9223372036854775808.0 || llrint(a) >= amax)
>>>>>>>>>>>>>> + return amax;
>>>>>>>>>>>>>> + if (a <= -9223372036854775808.0 || llrint(a) <= amin)
>>>>>>>>>>>>>> + return amin;
>>>>>>>>>>>>>
>>>>>>>>>>>>> Doesn't this allow negative overflows in the max check? I think you
>>>>>>>>>> need
>>>>>>>>>>>>> both overflow checks before the min/max checks. Try the next float
>>>>>>>> val
>>>>>>>>>>>>> smaller than int64_min as input with a small amax, eg 0. I bet it
>>>>>>>>>>>> returns 0
>>>>>>>>>>>>> instead of amin.
>>>>>>>>>>>>
>>>>>>>>>>>> They are needed. As you and others can clearly see, I am quite bad
>>>>>>>>>>>> with this stuff.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hm, so, getting back to my computer, I wanted to test this, and I have
>>>>>>>>>> this
>>>>>>>>>>> problem: llrint() works correctly for me for the "undefined" cases,
>>>>>>>> i.e.,
>>>>>>>>>>> it already does what you're trying to fix in av_rint64_clip_c.
>>>>>>>>>>>
>>>>>>>>>>> llrint(-10223372056756029440.000000) returns -9223372036854775808
>>>>>>>>>>> llrint(10223372056756029440.000000) returns 9223372036854775807
>>>>>>>>>>>
>>>>>>>>>>> So, how do I reproduce that llrint() overflows?
>>>>>>>>>>
>>>>>>>>>> The link I gave originally
>>>>>>>>>>
>>>>>>>> http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer
>>>>>>>>>> gives an illustration. Maybe the weird behavior happens only on
>>>>>>>>>> 9223372036854775808.0. This happens because INT64_MAX+1 is not
>>>>>>>>>> representable in long long, and hence signed overflow occurs yielding
>>>>>>>>>> INT64_MIN (of course undefined). Here is a minimal test program:
>>>>>>>>>> #include <stdio.h>
>>>>>>>>>> #include <math.h>
>>>>>>>>>>
>>>>>>>>>> int main(void) {
>>>>>>>>>> printf("%lld\n", llrint(9223372036854775808.0));
>>>>>>>>>> return 0;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 9223372036854775807
>>>>>>>>>
>>>>>>>>> Seems apple's libc got one thing right :)
>>>>>>>>
>>>>>>>> I personally am not that charitable, looking more carefully at your
>>>>>>>> asm shows a cmplesd, suggesting slowdown. Here is a source reference:
>>>>>>>> https://opensource.apple.com/source/Libm/Libm-2026/Source/ARM/llrint.c.
>>>>>>>> As usual, Apple dumps many implementations of llrint and it is unclear
>>>>>>>> which is actually being used on OS X at the moment (see e.g other
>>>>>>>> https://opensource.apple.com/source/Libm/Libm-92/i386.subproj/llrint.c),
>>>>>>>> but I digress.
>>>>>>>>
>>>>>>>> They essentially all put special case code like the patch above. Thus
>>>>>>>> their function is inherently slower than the conformant GNU libm
>>>>>>>> implementation. A client may very well want a branch free llrint for
>>>>>>>> speed. Apple offers no performance choice here, forcing a fast llrint
>>>>>>>> to use cvt2dsi inline or equivalent. Don't know if FFmpeg is affected
>>>>>>>> by this slowdown.
>>>>>>>
>>>>>>>
>>>>>>> I think FFmpeg should consider using Apple's version as a x86
>>>>>>> implementation for av_rint64_clip :)
>>>>>>
>>>>>> I don't agree with this: it is a far less readable implementation with
>>>>>> many more lines of code, and worse yet only handles the llrint aspect
>>>>>> and not the clipping. Regardless, belongs to a separate patch/thread.
>>>>>> Pushed. Thanks all for reviews.
>>>>>>
>>>>>
>>>>> This change broke building on VS2012, llrint is apprently not available there.
>>>>> Note that this is a public header, so our compat headers ala
>>>>> avutil/libm.h cannot be included there.
>>>>
>>>> Hmm, so I could create a local avpriv_llrint with some ifdefry - e.g
>>>> for GNU_C, use llrint, else use a slower implementation on the lines
>>>> of Apple's.
>>>> Any cleaner solutions?
>>>
>>> Possibly better idea: use floor(f + 0.5) as a hack (c89) on things
>>> lacking llrint (via a HAVE_LLRINT check). This won't result in
>>> identical output past a sufficiently large power of 2, but is still a
>>> safe API. It is also clearer and smaller. Idea inspired by
>>> avcodec/mpegaudio_tablegen.h (where this hack may be removed).
>>>
>>
>> This code is in a public header, public headers don't have access to
>> config.h, so no HAVE_* checks.
>> You could make it non-inline, then you have all the freedom in the world.
>>
>> - Hendrik
>
> As others have pointed out on IRC, why is this function needed to begin with?
> It's not used anywhere currently.
Not ATM, but see the ffserver_config patch under review with recent
comments by Michael. There are many more such instances where this API
is needed, IIRC one in cmdutils.c.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list