[FFmpeg-devel] r9017 breaks WMA decoding on Intel Macs
Trent Piepho
xyzzy
Fri Jun 1 00:39:43 CEST 2007
On Thu, 31 May 2007, Michael Niedermayer wrote:
> On Thu, May 31, 2007 at 01:05:04PM +0200, Michael Niedermayer wrote:
> [...]
> > [...]
> > > >> But IMHO, it's a bit pointless, because
> > > >> whatever the speed figures may look like, we are comparing 1 solution
> > > >> that appears to work by luck, and another that is more reliable. Speed
> > > >> isn't what your patch is after.
> > > >
> > > > There is no luck in the old solution providing :
> > > > - we tell gcc the memory we modify (may be using "memory" clobber).
> > > > - we use a gas supporting the +(%reg) syntax
> > >
> > > I disagree. Newer gas _do_ complain about the syntax, and Trent
> > > already explained the shortcomings of current implementation, no need
> > > for me to restate them here.
> >
> > put a "memory" on the clobber list and trents argument is gone
>
> let me elaborate on this a little more
>
> the "memory" is needed because SSE/MMX writes to more than just the
> first float/FFTSample (yeah thats pretty much the purpose of MMX/SSE)
> gcc is not aware of this, so EVERY solution which uses
> "+m" / "=m" to write needs a "memory" clobber
That's not true. You can simply do something like:
typedef struct { FFTSample x[4]; } FTTSampleSSE;
Then cast the memory operands to FFTSampleSSE.
asm("..." : "=m"(*(FFTSampleSSE*)(tcos+k)) );
> to summarize the possible solutions
> 1. dont support ancient assemblers
> 2. use 123%4 notation (most incorrect syntax possible, and will silently
> generate wrong code on all assemblers if you are unlucky)
> 3. add more "m" operands to avoid the offsets (might be slower, and might
> fail on some gcc versions)
"might"
> 4. write the whole loop in asm
>
> note, ALL solutions need a "memory" clobber (or some other nasty tricks)
If you cast the operands through a struct that's 64-bits/128-bits as
apporpriate then you don't need to memory clobber.
More information about the ffmpeg-devel
mailing list