[FFmpeg-devel] Amazing intrinsics improvments in gcc 4
Alexander Strange
astrange
Wed Mar 19 20:01:27 CET 2008
On Mar 19, 2008, at 2:43 PM, Michael Niedermayer wrote:
> On Wed, Mar 19, 2008 at 07:21:14PM +0100, Luca Barbato wrote:
>> Michael Niedermayer wrote:
>>> I thought some people here would be interrested as there were
>>> various claims
>>> on gccs abilities and improvments posted here lately ...
>>
>> ------- Comment #23 From Uros Bizjak 2008-03-19 10:45 -------
>>
>> As said in PR 19161:
>>
>> The LCM infrastructure doesn't support mode switching in the way that
>> would be
>> usable for emms. Additionally, there are MANY problems expected
>> when sharing
>> x87 and MMX registers (i.e. handling of uninitialized x87 registers
>> at the
>> beginning of the function - this is the reason we don't implement x87
>> register
>> passing ABI).
>>
>> Automatic MMX vectorization is not exactly a much usable feature
>> nowadays (we
>> have SSE that works quite well here). Due to recent changes in MMX
>> register
>> allocation area, excellent code is produced using MMX intrinsics, I'm
>> closing
>> this bug as WONTFIX.
>>
>> Also, auto-vectorization would produce either MMX or SSE code, but
>> not
>> both of
>> them:
>>
>> #define UNITS_PER_SIMD_WORD (TARGET_SSE ? 16 : UNITS_PER_WORD)
>>
>> Seems Uros is fighting your battle and providing some interesting
>> code.
>>
>> Still, the root of the problem is that x86 sucks.
>
> No, the root of the problem is that gcc devels are idiots
> gcc has no business putting emms anywhere, thats the programmers job
> same as with free().
> If i do write SIMD code i do know what iam doing and do know i might
> have
> to execute emms, i absolutely dont want gcc to guess it behind my
> back.
>
> Also if i explicitly force gcc to use paddw:
> void test(){
> w= __builtin_ia32_paddw(w,w);
> dw= (mmxdw)w;
> }
> -----
> gcc-4.3 -mtune=pentium3 -march=pentium3 -fomit-frame-pointer -S -O3
> generates:
> subl $12, %esp
> movq w, %mm0
> movq %mm0, (%esp)
> paddw %mm0, %mm0
> movq %mm0, w
> movl w, %eax
> movl w+4, %edx
> movl %eax, dw
> movl %edx, dw+4
> addl $12, %esp
> ret
> -----
> compared to
> gcc-3.4 -mtune=pentium3 -march=pentium3 -fomit-frame-pointer -S -O3
> movq w, %mm1
> paddw %mm1, %mm1
> movq %mm1, w
> movq w, %mm0
> movq %mm0, dw
> ret
>
> So where is that "excellent code is produced using MMX intrinsics" ???
It's in gcc 4.4:
gcc version 4.4.0 20080318 (experimental) (GCC)
subl $12, %esp
movq _w, %mm0
paddw %mm0, %mm0
movq %mm0, _w
movq _w, %mm0
movq %mm0, _dw
addl $12, %esp
ret
Actually, it was fixed because someone converted dsputil code into
intrinsics and complained on the mailing list that the result was
terrible.
For the version with +, it uses mm0 but not paddw - isn't that just as
unsafe?
More information about the ffmpeg-devel
mailing list