[FFmpeg-devel] [PATCH] SSE-optimized vector_clipf()
Vitor Sessak
vitor1001
Sat Aug 8 23:55:53 CEST 2009
Michael Niedermayer wrote:
> On Sat, Aug 08, 2009 at 09:04:14AM +0200, Vitor Sessak wrote:
>> Michael Niedermayer wrote:
>>> On Thu, Aug 06, 2009 at 02:55:30AM +0200, Vitor Sessak wrote:
>>>> Vitor Sessak wrote:
>>>>> $subj, 10% speedup for twinvq decoding (but should be useful also for
>>>>> AMR and wmapro).
>>>> err, I mean, attached.
>>>>
>>>> -Vitor
>>>> dsputil.c | 15 +++++++++++++++
>>>> dsputil.h | 3 ++-
>>>> x86/dsputil_mmx.c | 34 ++++++++++++++++++++++++++++++++++
>>>> 3 files changed, 51 insertions(+), 1 deletion(-)
>>>> 8a95f5f2f3d267089056d6a571b2e6cc37d1569e dsp_vector_clipf.diff
>>>> Index: libavcodec/dsputil.c
>>>> ===================================================================
>>>> --- libavcodec/dsputil.c (revision 19598)
>>>> +++ libavcodec/dsputil.c (working copy)
>>>> @@ -4093,6 +4093,20 @@
>>>> dst[i] = src[i] * mul;
>>>> }
>>>> +void vector_clipf_c(float *dst, float min, float max, int len) {
>>>> + int i;
>>>> + for (i=0; i < len; i+=8) {
>>>> + dst[i ] = av_clipf(dst[i ], min, max);
>>>> + dst[i + 1] = av_clipf(dst[i + 1], min, max);
>>>> + dst[i + 2] = av_clipf(dst[i + 2], min, max);
>>>> + dst[i + 3] = av_clipf(dst[i + 3], min, max);
>>>> + dst[i + 4] = av_clipf(dst[i + 4], min, max);
>>>> + dst[i + 5] = av_clipf(dst[i + 5], min, max);
>>>> + dst[i + 6] = av_clipf(dst[i + 6], min, max);
>>>> + dst[i + 7] = av_clipf(dst[i + 7], min, max);
>>>> + }
>>>> +}
>>> this one could be tried by using integer math instead of floats
>>> (assuming IEEE floats of course)
>> How could this possibly be faster? It would just clip the sign, then the
>> exponent, then the mantissa. It seems like much more work for me, unless
>> I'm missing something.
>
> we arent comparing integers by first checking the first bit then seperately
> the next 8 and then again seperately the last 23. Why should we here?
Ok, the exponent is fine, but a special treatment of the sign is needed.
I benchmarked the following and it is slower:
static inline float clipf_c_one(float a0,
uint32_t amin, uint32_t amax,
float aminf, float amaxf)
{
uint32_t ai = *(uint32_t *)&a0;
uint32_t sign = ai >> 31;
uint32_t a = ai ^ (sign << 31) - sign;
if ((signed)a < (signed)amin) return aminf;
else if ((signed)a > (signed)amax) return amaxf;
else return a0;
}
static void vector_clipf_c(float *dst, float min, float max, int len) {
int i;
uint32_t mini = *(uint32_t *)&min;
uint32_t maxi = *(uint32_t *)&max;
mini ^= ((mini >> 31) << 31) - (mini >> 31);
maxi ^= ((maxi >> 31) << 31) - (maxi >> 31);
for (i=0; i < len; i+=8) {
dst[i ] = clipf_c_one(dst[i ], mini, maxi, min, max);
dst[i + 1] = clipf_c_one(dst[i + 1], mini, maxi, min, max);
dst[i + 2] = clipf_c_one(dst[i + 2], mini, maxi, min, max);
dst[i + 3] = clipf_c_one(dst[i + 3], mini, maxi, min, max);
dst[i + 4] = clipf_c_one(dst[i + 4], mini, maxi, min, max);
dst[i + 5] = clipf_c_one(dst[i + 5], mini, maxi, min, max);
dst[i + 6] = clipf_c_one(dst[i + 6], mini, maxi, min, max);
dst[i + 7] = clipf_c_one(dst[i + 7], mini, maxi, min, max);
}
}
-Vitor
More information about the ffmpeg-devel
mailing list