[FFmpeg-devel] [Bug target/14552] compiled trivial vector intrinsic code is inefficient

Sat Mar 22 03:34:31 CET 2008

On Fri, Mar 21, 2008 at 10:34:00AM -0000, ubizjak at gmail dot com wrote:
> 
> 
> ------- Comment #36 from ubizjak at gmail dot com  2008-03-21 10:33 -------
> (In reply to comment #35)
> 
> > Also ffmpeg uses almost entirely asm() instead of intrinsics so this alone is
> > not so much a problem for ffmpeg than it is for others who followed the
> > recommandition of "intrinsics are better than asm".
> > 
> > About trolling, well i made no attempt to reply politely and diplomatic, no.
> > But "solving" a "problem" in some use case by droping support for that use
> > case is kinda extreem.
> > 
> > The way i see it is that
> > * Its non trivial to place emms optimally and automatically
> > * there needs to be a emms between mmx code and fpu code
> > 
> > The solutions to this would be any one of
> > A. let the programmer place emms like it has been in the past
> > B. dont support mmx at all
> > C. dont support x87 fpu at all
> > D. place emms after every bunch of mmx instructions
> > E. solve a quite non trivial problem and place emms optimally
> > 
> > The solution which has been selected apparently is B., why was that choosen?
> > Instead of lets say A.?
> > 
> > If i do write SIMD code then i do know that i need an emms on x86. Its
> > trivial for the programmer to place it optimally.
> 
> I don't know where you get the idea that MMX support was dropped in any way. I

Maybe because the SIMD code in this PR compiled with -mmmx does not use mmx
but very significantly less efficient integer instructions. And you added a
test to gcc which ensures that this case does not use mmx instructions.

This is pretty much the definion of droping mmx support (for this specific
case).

> won't engage in a discussion about autovectorisation, intrinsics, builtins,
> generic vectorisation, etc, etc with you,

And somehow iam glad about that.

> but please look at PR 21395 how
> performance PR should be filled. 

> The MMX code in that PR is _far_ from trivial,

Well that is something i would disagree about.

> but since it is well written using intrinsic instructions, it enables
> jaw-dropping performance increase that is simply not possible when ASM blocks
> are used.
> 
> Now, I'm sure that you have your numbers ready to back up your claims from
> Comment #33 about performance of generated code, and I challenge you to beat
> performance of gcc-4.4 generated code by hand-crafted assembly using the
> example of PR 21395.

done, 
jaw-dropping intrinsics need 
2.034s 

stinky hand written asm needs 
1.312s

But you can read the details in PR 21395.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The greatest way to live with honor in this world is to be what we pretend
to be. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080322/6a6241c6/attachment.pgp>