[Ffmpeg-devel] Re: [PATCH] SIMD accelected SNOW decoding
Michael Niedermayer
michaelni
Mon Nov 28 00:30:17 CET 2005
Hi
On Mon, Nov 28, 2005 at 12:12:52AM +0100, Guillaume POIRIER wrote:
> Hi,
>
> On 11/27/05, Guillaume POIRIER <poirierg at gmail.com> wrote:
> > Hi,
> >
> > On 11/27/05, Guillaume POIRIER <poirierg at gmail.com> wrote:
> > > Hi there,
> > >
> > > I long time ago (6month), yartrebo wrote some 2 routines to speed-up
> > > SNOW decoding (30-40% faster). It never got committed because neither
> > > of the 2 were working on AMD64.
> > >
> > > 6 month later, I suspect more talented people can look at it.
> > >
> > > Find in attachment the work-in-progress patch yartrebo sent me before
> > > going in summer break (never to return again it seems).
> > >
> > > See below for the gdb backtrace of one of the routine (both trigger a
> > > segfault). Unfortunately, that doesn't give the very line number the
> > > fails on the ASM (maybe because the program never actually reaches the
> > > asm be fails to call it?).
> >
> > Hum, a closer look at the asm shows a series of IA32 style registers,
> > rather than the use REG_xx which are used throughout the rest of
> > ffmpeg code. No wonder it could not work! :)
> >
> > I'll fix that and see what happens.
>
> Well, it was a bit more complicated (to me) than it looked like.
> Apparently, the clobber list had to be fixed to.
>
> Please find in attachment a "fixed" version with hardcoded AMD-64 regs
> (instead of using the #define REG_a "rax" type of define, that do not
> seem to work well presently with this code).
>
> I still get a segfault though:
[...]
> 0x00000000006a0d11 <ff_spatial_idwt_buffered_slice+1505>: paddd
> (%rdx,%rdx,4),%xmm1
this cannot work, its dereferencing (5*rdx), its like
*(int*)(5*x)
[...]
--
Michael
More information about the ffmpeg-devel
mailing list