[FFmpeg-devel] [RFC] SSE3/4 implementation of flac_encode_residual_lpc
Bobby Bingham
uhmmmm
Sat May 23 19:40:08 CEST 2009
On Sat, 23 May 2009 07:00:59 -0400
Jason Garrett-Glaser <darkshikari at gmail.com> wrote:
> On Fri, May 22, 2009 at 11:40 PM, Bobby Bingham <uhmmmm at gmail.com>
> wrote:
> > On Sun, 3 May 2009 21:21:19 -0700
> > Jason Garrett-Glaser <darkshikari at gmail.com> wrote:
> >> > "phaddd ? ? %%xmm1, %%xmm0 ? ? ? ? ?\n\t"
> >> > "phaddd ? ? %%xmm3, %%xmm2 ? ? ? ? ?\n\t"
> >> > "phaddd ? ? %%xmm2, %%xmm0 ? ? ? ? ?\n\t" ? // xmm0 = [p0, p1,
> >> > p2, p3]
> >>
> >> Did you not find a better way of doing this without PHADD, given
> >> how slow it is?
> >
> > The best I've come up with so far is this, but I can't compare the
> > speed:
> >
> > "movdqa ? ? %%xmm0, %%xmm4 ? ? ? ? ?\n\t"
> > "movdqa ? ? %%xmm2, %%xmm5 ? ? ? ? ?\n\t"
> > "punpckldq ?%%xmm1, %%xmm0 ? ? ? ? ?\n\t"
> > "punpckhdq ?%%xmm1, %%xmm4 ? ? ? ? ?\n\t"
> > "punpckldq ?%%xmm3, %%xmm2 ? ? ? ? ?\n\t"
> > "punpckhdq ?%%xmm3, %%xmm5 ? ? ? ? ?\n\t"
> > "paddd ? ? ?%%xmm4, %%xmm0 ? ? ? ? ?\n\t"
> > "paddd ? ? ?%%xmm5, %%xmm2 ? ? ? ? ?\n\t"
> > "movdqa ? ? %%xmm0, %%xmm1 ? ? ? ? ?\n\t"
> > "punpcklqdq %%xmm2, %%xmm0 ? ? ? ? ?\n\t"
> > "punpckhqdq %%xmm2, %%xmm1 ? ? ? ? ?\n\t"
> > "paddd ? ? ?%%xmm1, %%xmm0 ? ? ? ? ?\n\t"
>
> You really should not be writing assembly without a system to test it
> on.
>
> Various people have shell accounts they can loan you--for example,
> checkers on #x264 can give out shell accounts on Penryn-based Linux
> systems.
In the meantime, here's an SSE2 version I have tested. I'm not really
happy with calling the C version for the cases where 32 bit multiplies
are needed, but I haven't found the time yet to implement that in <SSE4.
Also, with this patch, gcc warns that need32 might be used
uninitialized, but it is always initialized by the assembly. Does
someone know how to silence this warning?
--
Bobby Bingham
??????????????????????
-------------- next part --------------
A non-text attachment was scrubbed...
Name: flac_sse2.patch
Type: text/x-patch
Size: 10256 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090523/b365a8fe/attachment.bin>
More information about the ffmpeg-devel
mailing list