[FFmpeg-devel] [PATCH] x86/swr: make int32_to_int32 un/pack_2ch functions SSE
James Almer
jamrial at gmail.com
Wed Jan 14 22:23:54 CET 2015
On 14/01/15 1:59 PM, Michael Niedermayer wrote:
> On Wed, Jan 14, 2015 at 01:53:48AM -0300, James Almer wrote:
>> unpack_2ch is already using sse float ops only, and pack_2ch is a trivial change.
>> Rename both to float_to_float for consistency.
>>
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>> libswresample/x86/audio_convert.asm | 14 ++++++++------
>> libswresample/x86/audio_convert_init.c | 11 +++++++----
>> 2 files changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/libswresample/x86/audio_convert.asm b/libswresample/x86/audio_convert.asm
>> index 1617e0b..c13c26f 100644
>> --- a/libswresample/x86/audio_convert.asm
>> +++ b/libswresample/x86/audio_convert.asm
>> @@ -60,8 +60,8 @@ pack_2ch_%2_to_%1_u_int %+ SUFFIX
>> punpcklwd m0, m2
>> punpckhwd m1, m2
>> %else
>> - punpckldq m0, m2
>> - punpckhdq m1, m2
>> + unpcklps m0, m2
>> + unpckhps m1, m2
>> %endif
>> %6 m0,m1,m2,m3,m4,m5
>> %else
>
> did you benchmark this ?
> ive just checked and on Pentium M, Core Solo and Core Duo these are
> listed as having only 1/5 the throughput
> on sandybridge they are still listed with half the throughput than
> their integer counterparts
> i didnt benchmark it though
No, i didn't benchmark. And you're right, even on recent CPUs they seem to
have half the throughput as the integer counterparts.
Do you think it will mean a considerable performance hit? These functions
aren't even that important in audio processing anyway (perf shows they
represent less than 1% of total cpu time when doing pcm -> pcm).
Nonetheless, considering this maybe the other functions should be changed
to not use SBUTTERFLYPS.
More information about the ffmpeg-devel
mailing list