[FFmpeg-devel] [PATCH] Fix emulated_edge_mc SSE code to not contain SSE2 instructions on x86-32.
Michael Niedermayer
michaelni at gmx.at
Thu Oct 10 14:52:27 CEST 2013
On Thu, Oct 10, 2013 at 07:56:51AM -0400, Ronald S. Bultje wrote:
> Hi,
>
> On Thu, Oct 10, 2013 at 7:35 AM, Michael Niedermayer <michaelni at gmx.at>wrote:
>
> > On Wed, Oct 09, 2013 at 09:24:20PM -0400, Ronald S. Bultje wrote:
> > > ---
> > > libavcodec/x86/videodsp.asm | 64
> > +++++++++++++++++++++++++-----------------
> > > libavcodec/x86/videodsp_init.c | 11 ++++++--
> > > 2 files changed, 47 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/libavcodec/x86/videodsp.asm b/libavcodec/x86/videodsp.asm
> > > index cc2105d..aa865f5 100644
> > > --- a/libavcodec/x86/videodsp.asm
> > > +++ b/libavcodec/x86/videodsp.asm
> > > @@ -100,7 +100,7 @@ cglobal emu_edge_hvar, 5, 6, 1, dst, dst_stride,
> > start_x, n_words, h, w
> > > ; FIXME also write a ssse3 version using pshufb
> > > movzx wd, byte [dstq+start_xq] ; w = read(1)
> > > imul wd, 0x01010101 ; w *= 0x01010101
> > > - movd m0, wd
> > > + movd m0, wd ; FIXME this is sse2,
> > not sse
> > [...]
> > > diff --git a/libavcodec/x86/videodsp_init.c
> > b/libavcodec/x86/videodsp_init.c
> > > index a709d03..b5bab72 100644
> > > --- a/libavcodec/x86/videodsp_init.c
> > > +++ b/libavcodec/x86/videodsp_init.c
> > > @@ -219,9 +219,14 @@ static av_noinline void
> > emulated_edge_mc_sse(uint8_t *buf, ptrdiff_t buf_stride,
> > > int block_w, int block_h,
> > > int src_x, int src_y, int
> > w, int h)
> > > {
> > > - emulated_edge_mc(buf, buf_stride, src, src_stride, block_w, block_h,
> > > - src_x, src_y, w, h, vfixtbl_sse,
> > &ff_emu_edge_vvar_sse,
> > > - hfixtbl_sse, &ff_emu_edge_hvar_sse);
> > > + emulated_edge_mc(buf, buf_stride, src, src_stride, block_w,
> > block_h, src_x,
> > > + src_y, w, h, vfixtbl_sse, &ff_emu_edge_vvar_sse,
> > hfixtbl_sse,
> > > +#if ARCH_X86_64
> > > + &ff_emu_edge_hvar_sse
> > > +#else
> > > + &ff_emu_edge_hvar_mmx
> > > +#endif
> >
> > this is a bit ugly but more important is to fix that bug so
> > applied
> >
> > thanks
>
>
> I'm not really sure how to properly fix this. I mean, I can temporarily
> store on the stack (e.g. mov r0m, wd, followed by movss m0, r0m), but that
> introduces an extra instruction in the loop, so we'd probably want a sse
> and a sse2 version of the function (on x86-32 only of course; x86-64 would
> simply default to the sse2 one), and then have 3 (mmx, sse or sse2) instead
> of the current 2 wrappers in videodsp_init.c (again, only on x86-32; x86-64
> would always use sse2).
>
> In terms of binary size, it's not that bad, since it's only one small
> function that would have 3 implementations (hvar); all others would simply
> use the sse version in the sse2 wrapper. Does that sound ok? Loren, any
> better ideas?
hmm, thinking about it for a moment ...
MMX and SSE make no sense for x86_64 when SSE2 is available, there
are no cpus that would use them
and IIRC SSE was not (much) faster than using twice as many MMX
instructions on pre SSE2 cpus.
so maybe MMX + SSE2 alone would be an option but then maybe i miss
something ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
I know you won't believe me, but the highest form of Human Excellence is
to question oneself and others. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20131010/faa20252/attachment.asc>
More information about the ffmpeg-devel
mailing list