[FFmpeg-devel] [PATCH 2/3] x86: asm for sign/zero_extend()
Måns Rullgård
mans
Sun Feb 21 03:37:17 CET 2010
Michael Niedermayer <michaelni at gmx.at> writes:
> On Sun, Feb 21, 2010 at 12:35:58AM +0000, Mans Rullgard wrote:
>> ---
>> libavcodec/x86/mathops.h | 18 ++++++++++++++++++
>> 1 files changed, 18 insertions(+), 0 deletions(-)
>>
>> diff --git a/libavcodec/x86/mathops.h b/libavcodec/x86/mathops.h
>> index 010cfb7..0c17f35 100644
>> --- a/libavcodec/x86/mathops.h
>> +++ b/libavcodec/x86/mathops.h
>> @@ -97,4 +97,22 @@ static inline uint32_t NEG_USR32(uint32_t a, int8_t s){
>> return a;
>> }
>>
>> +#define sign_extend sign_extend
>> +static inline int sign_extend(int val, unsigned bits)
>> +{
>> + __asm__ ("shll %1, %0 \n\t"
>> + "sarl %1, %0 \n\t"
>> + : "+&r" (val) : "ic" ((uint8_t)-bits));
>> + return val;
>> +}
>> +
>
>> +#define zero_extend zero_extend
>> +static inline unsigned zero_extend(unsigned val, unsigned bits)
>> +{
>> + __asm__ ("shll %1, %0 \n\t"
>> + "shrl %1, %0 \n\t"
>> + : "+&r" (val) : "ic" ((uint8_t)-bits));
>> + return val;
>
> if bits is a constant (which i guess it is quite often)
> then this is quite inefficient.
> val & 0x00007FFF
> for example is more efficient in that case.
> also its not certain 2 shifts are faster than =-1, >>, &
> on all x86 cpus
Tell that to whoever wrote the asm for the NEG_*SR32 functions.
Oh, wait... that was you...
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list