[FFmpeg-devel] [flamefest-start] A little something on MMX/SSE intrinsics
Michael Niedermayer
michaelni
Thu Feb 28 23:53:41 CET 2008
On Thu, Feb 28, 2008 at 10:15:35PM +0100, Luca Barbato wrote:
> Michael Niedermayer wrote:
> >
> > I feel like iam talking against brick walls. The point is that intrinsics
> > are flawed because they are unpredictable, gcc could generate efficient
> > code from them, but it as well can (and does in current versions on x86)
> > generate completely dismal code. This does not go away if gcc becomes better
> > at generating code.
>
> gcc isn't predictable even at managing asm blocks as we could experience
> with the register constrained architectures... (yes x86 again)
As i said at some other point in the thread i prefer a compilation failure
which i can fix over a silent pessimization of code i do not even know
about.
>
>
> > We write asm/intrinsics because gcc did NOT compile the C code to something
> > efficient in at least some cases. Asm is optimized once and will then always
> > be efficient for the cpu class for which it has been optimized. That is its
> > a write once and forget thing. Intrinsics OTOH are at the mercy of the
> > current compiler version and require constant maintaince to ensure that they
> > dont get miscompiled to something inefficient.
>
> I cannot agree more, in fact having a set of asm routines for G3, G5,
> CELL and pa-semi wound be great. Same would be said for asm for P4 and
> amd64, since they are _quite_ different in the end.
>
> Sparing some pain and using intrinsics to get quite similar results for
> the whole PPC/PPC64 or x68/x86_64 families wouldn't be bad as starting
> point.
If you plan to ever write the asm() your efforts with intrinsics were wasted.
If you dont plan to ever write asm() its of course a different story ...
[...]
> > But the key advantage asm() has IMO is that the compiler can NOT second guess
> > what the programmer wanted, it can NOT reorder the instructions behind the
> > programmers back and it can NOT silently put unneeded load+stores between
> > instructions.
>
> The main issue with intrinsics is that they are more than often ugly and
> do not deliver what they are supposed to, but that's is just an
> implementation detail that could be ironed out with a little cooperation
> between users and developers.
>
> you can get silent load+store or even better have the whole outer loop
> pessimized due bogus constraints in the asm block...
No you cannot, proper asm looks like:
function_mmx(){
asm(
...
);
}
Which is called through a function pointer. Theres no outer loop which
knows of what is done inside the function.
Also the whole inner loop is all inside a single asm() no way gcc could
mess it up.
[...]
> > And code quality standards in ffmpeg are high, writing 5% slower code is
> > plain unacceptable.
>
> I could say that having the x86 asm routines that happens to work by
> hack on x86_64 are in that range, still better that than plain C, isn't it?
I do not think we have much hacked x86 -> x86_64 code that would be slower
than the equivalent intrinsics on x86_64.
If you find some report it please!
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080228/ec12cce0/attachment.pgp>
More information about the ffmpeg-devel
mailing list