[FFmpeg-devel] [PATCH] SSE-optimized vector_clipf()
Michael Niedermayer
michaelni
Sun Aug 9 06:00:32 CEST 2009
On Sat, Aug 08, 2009 at 11:55:53PM +0200, Vitor Sessak wrote:
> Michael Niedermayer wrote:
>> On Sat, Aug 08, 2009 at 09:04:14AM +0200, Vitor Sessak wrote:
>>> Michael Niedermayer wrote:
>>>> On Thu, Aug 06, 2009 at 02:55:30AM +0200, Vitor Sessak wrote:
>>>>> Vitor Sessak wrote:
>>>>>> $subj, 10% speedup for twinvq decoding (but should be useful also for
>>>>>> AMR and wmapro).
>>>>> err, I mean, attached.
>>>>>
>>>>> -Vitor
>>>>> dsputil.c | 15 +++++++++++++++
>>>>> dsputil.h | 3 ++-
>>>>> x86/dsputil_mmx.c | 34 ++++++++++++++++++++++++++++++++++
>>>>> 3 files changed, 51 insertions(+), 1 deletion(-)
>>>>> 8a95f5f2f3d267089056d6a571b2e6cc37d1569e dsp_vector_clipf.diff
>>>>> Index: libavcodec/dsputil.c
>>>>> ===================================================================
>>>>> --- libavcodec/dsputil.c (revision 19598)
>>>>> +++ libavcodec/dsputil.c (working copy)
>>>>> @@ -4093,6 +4093,20 @@
>>>>> dst[i] = src[i] * mul;
>>>>> }
>>>>> +void vector_clipf_c(float *dst, float min, float max, int len) {
>>>>> + int i;
>>>>> + for (i=0; i < len; i+=8) {
>>>>> + dst[i ] = av_clipf(dst[i ], min, max);
>>>>> + dst[i + 1] = av_clipf(dst[i + 1], min, max);
>>>>> + dst[i + 2] = av_clipf(dst[i + 2], min, max);
>>>>> + dst[i + 3] = av_clipf(dst[i + 3], min, max);
>>>>> + dst[i + 4] = av_clipf(dst[i + 4], min, max);
>>>>> + dst[i + 5] = av_clipf(dst[i + 5], min, max);
>>>>> + dst[i + 6] = av_clipf(dst[i + 6], min, max);
>>>>> + dst[i + 7] = av_clipf(dst[i + 7], min, max);
>>>>> + }
>>>>> +}
>>>> this one could be tried by using integer math instead of floats
>>>> (assuming IEEE floats of course)
>>> How could this possibly be faster? It would just clip the sign, then the
>>> exponent, then the mantissa. It seems like much more work for me, unless
>>> I'm missing something.
>> we arent comparing integers by first checking the first bit then
>> seperately
>> the next 8 and then again seperately the last 23. Why should we here?
>
> Ok, the exponent is fine, but a special treatment of the sign is needed. I
> benchmarked the following and it is slower:
ahh, right, i forgot about the sign issue, the trick works just with
positive numbers or when one compares the absolute value (shift sign out)
later could still be usefull maybe
if( |f| > A){
f= clip(f,A,B)
}
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090809/9e60a9d5/attachment.pgp>
More information about the ffmpeg-devel
mailing list