[FFmpeg-devel] [PATCH] vf_ssim: x86 simd for ssim_4x4xN and ssim_endN.
Michael Niedermayer
michael at niedermayer.cc
Tue Jul 14 05:06:17 CEST 2015
On Mon, Jul 13, 2015 at 11:39:15PM -0300, James Almer wrote:
> On 12/07/15 8:33 PM, Ronald S. Bultje wrote:
> > +INIT_XMM sse4
> > +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w
> > + pxor m0, m0
> > +.loop:
> > + mova m1, [sum0q+mmsize*0]
> > + mova m2, [sum0q+mmsize*1]
> > + mova m3, [sum0q+mmsize*2]
> > + mova m4, [sum0q+mmsize*3]
> > + paddd m1, [sum1q+mmsize*0]
> > + paddd m2, [sum1q+mmsize*1]
> > + paddd m3, [sum1q+mmsize*2]
> > + paddd m4, [sum1q+mmsize*3]
> > + paddd m1, m2
> > + paddd m2, m3
> > + paddd m3, m4
> > + paddd m4, [sum0q+mmsize*4]
> > + paddd m4, [sum1q+mmsize*4]
> > + TRANSPOSE4x4D 1, 2, 3, 4, 5
> > +
> > + ; m1 = fs1, m2 = fs2, m3 = fss, m4 = fs12
> > + pslld m3, 6
> > + pslld m4, 6
> > + pmulld m5, m1, m2 ; fs1 * fs2
> > + pmulld m1, m1 ; fs1 * fs1
> > + pmulld m2, m2 ; fs2 * fs2
>
> If these values are guaranteed to be always positive then this could also
> be implemented with pmuludq to get an sse2 version working. Although I'm
> not sure if it's worth doing. It will be six pmuludq and an awful lot of
> shuffling and unpacking when the speed up of the sse4 version is already
> only ~2x the C version.
>
> This was already oked (Same with the psnr sse2 code), so it should be
> pushed already.
/me wonders a little bit why noone else applied it yet, but
applied
thanks
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150714/46638819/attachment.sig>
More information about the ffmpeg-devel
mailing list