[FFmpeg-devel] [PATCH] ac3enc: Add x86-optimized function to speed up log2_tab().
Justin Ruggles
justin.ruggles
Mon Feb 14 00:08:03 CET 2011
On 02/13/2011 05:49 PM, Loren Merritt wrote:
>> +cglobal ac3_max_msb_abs_int16_%1, 2,2,5, src, len
>> + pxor m2, m2
>> + pxor m3, m3
>> +.loop:
>> +%ifidn %2, min_max
>> + mova m0, [srcq]
>> + mova m1, [srcq+mmsize]
>> + pminsw m2, m0
>> + pminsw m2, m1
>> + pmaxsw m3, m0
>> + pmaxsw m3, m1
>> +%else ; or_abs
>> +%ifidn %1, mmx
>> + mova m0, [srcq]
>> + mova m1, [srcq+mmsize]
>> + ABS2 m0, m1, m3, m4
>> +%else ; ssse3
>> + ; using memory args is faster for ssse3
>> + pabsw m0, [srcq]
>> + pabsw m1, [srcq+mmsize]
>> +%endif
>> + por m2, m0
>> + por m2, m1
>> +%endif
>> + add srcq, mmsize*2
>> + sub lend, mmsize
>> + ja .loop
>> +%ifidn %2, min_max
>> + ABS2 m2, m3, m0, m1
>> + por m2, m3
>> +%endif
>> +%ifidn mmsize, 16
>> + mova m0, m2
>> + punpckhqdq m0, m0
>
> movhlps
Ah, I thought there was some instruction like that, but I must have
missed it when I searched for it. I'll send a new patch to change this
line since the original patch was already committed.
Thanks,
Justin
More information about the ffmpeg-devel
mailing list