[FFmpeg-devel] [PATCH] SSE-optimized vector_clipf()

Sun Aug 9 06:00:32 CEST 2009

On Sat, Aug 08, 2009 at 11:55:53PM +0200, Vitor Sessak wrote:
> Michael Niedermayer wrote:
>> On Sat, Aug 08, 2009 at 09:04:14AM +0200, Vitor Sessak wrote:
>>> Michael Niedermayer wrote:
>>>> On Thu, Aug 06, 2009 at 02:55:30AM +0200, Vitor Sessak wrote:
>>>>> Vitor Sessak wrote:
>>>>>> $subj, 10% speedup for twinvq decoding (but should be useful also for 
>>>>>> AMR and wmapro).
>>>>> err, I mean, attached.
>>>>>
>>>>> -Vitor
>>>>>  dsputil.c         |   15 +++++++++++++++
>>>>>  dsputil.h         |    3 ++-
>>>>>  x86/dsputil_mmx.c |   34 ++++++++++++++++++++++++++++++++++
>>>>>  3 files changed, 51 insertions(+), 1 deletion(-)
>>>>> 8a95f5f2f3d267089056d6a571b2e6cc37d1569e  dsp_vector_clipf.diff
>>>>> Index: libavcodec/dsputil.c
>>>>> ===================================================================
>>>>> --- libavcodec/dsputil.c	(revision 19598)
>>>>> +++ libavcodec/dsputil.c	(working copy)
>>>>> @@ -4093,6 +4093,20 @@
>>>>>          dst[i] = src[i] * mul;
>>>>>  }
>>>>>  +void vector_clipf_c(float *dst, float min, float max, int len) {
>>>>> +    int i;
>>>>> +    for (i=0; i < len; i+=8) {
>>>>> +        dst[i    ] = av_clipf(dst[i    ], min, max);
>>>>> +        dst[i + 1] = av_clipf(dst[i + 1], min, max);
>>>>> +        dst[i + 2] = av_clipf(dst[i + 2], min, max);
>>>>> +        dst[i + 3] = av_clipf(dst[i + 3], min, max);
>>>>> +        dst[i + 4] = av_clipf(dst[i + 4], min, max);
>>>>> +        dst[i + 5] = av_clipf(dst[i + 5], min, max);
>>>>> +        dst[i + 6] = av_clipf(dst[i + 6], min, max);
>>>>> +        dst[i + 7] = av_clipf(dst[i + 7], min, max);
>>>>> +    }
>>>>> +}
>>>> this one could be tried by using integer math instead of floats
>>>> (assuming IEEE floats of course)
>>> How could this possibly be faster? It would just clip the sign, then the 
>>> exponent, then the mantissa. It seems like much more work for me, unless 
>>> I'm missing something.
>> we arent comparing integers by first checking the first bit then 
>> seperately
>> the next 8 and then again seperately the last 23. Why should we here?
>
> Ok, the exponent is fine, but a special treatment of the sign is needed. I 
> benchmarked the following and it is slower:

ahh, right, i forgot about the sign issue, the trick works just with
positive numbers or when one compares the absolute value (shift sign out)
later could still be usefull maybe 
if( |f| > A){
        f= clip(f,A,B)
}

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090809/9e60a9d5/attachment.pgp>