[FFmpeg-devel] [FFmpeg-cvslog] x86/me_cmp: port mmxext and sse2 sad functions to yasm
Michael Niedermayer
michaelni at gmx.at
Wed Sep 17 19:40:04 CEST 2014
On Wed, Sep 17, 2014 at 01:14:33PM -0300, James Almer wrote:
> On 17/09/14 9:07 AM, Michael Niedermayer wrote:
> > On Wed, Sep 17, 2014 at 01:18:12PM +0200, Clément Bœsch wrote:
> >> On Wed, Sep 17, 2014 at 11:41:32AM +0200, James Almer wrote:
> >>> ffmpeg | branch: master | James Almer <jamrial at gmail.com> | Tue Sep 16 21:41:47 2014 -0300| [0456d169c469a79e305813d14c873fe698c8c572] | committer: Michael Niedermayer
> >>>
> >>> x86/me_cmp: port mmxext and sse2 sad functions to yasm
> >>>
> >>> Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of
> >>> sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext).
> >>> Since the _xy2 versions are not bitexact, they are accordingly marked as
> >>> approximate.
> >>>
> >>> Signed-off-by: James Almer <jamrial at gmail.com>
> >>> Signed-off-by: Michael Niedermayer <michaelni at gmx.at>
> >>>
> >>>> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=0456d169c469a79e305813d14c873fe698c8c572
> >>> ---
> >>>
> >>> libavcodec/x86/me_cmp.asm | 330 ++++++++++++++++++++++++++++++++++++++++++
> >>> libavcodec/x86/me_cmp_init.c | 203 +++++++-------------------
> >>> 2 files changed, 379 insertions(+), 154 deletions(-)
> >>>
> >>> diff --git a/libavcodec/x86/me_cmp.asm b/libavcodec/x86/me_cmp.asm
> >>> index b0741f3..27176f4 100644
> >>> --- a/libavcodec/x86/me_cmp.asm
> >>> +++ b/libavcodec/x86/me_cmp.asm
> >>> @@ -23,6 +23,10 @@
> >>>
> >>> %include "libavutil/x86/x86util.asm"
> >>>
> >>> +SECTION_RODATA
> >>> +
> >>> +cextern pb_1
> >>> +
> >>> SECTION .text
> >>>
> >>> %macro DIFF_PIXELS_1 4
> >>> @@ -465,3 +469,329 @@ cglobal hf_noise%1, 3,3,0, pix1, lsize, h
> >>> INIT_MMX mmx
> >>> HF_NOISE 8
> >>> HF_NOISE 16
> >>> +
> >>> +;---------------------------------------------------------------------------------------
> >>> +;int ff_sad_<opt>(MpegEncContext *v, uint8_t *pix1, uint8_t *pix2, int stride, int h);
> >>> +;---------------------------------------------------------------------------------------
> >>> +INIT_MMX mmxext
> >>> +cglobal sad8, 4, 4, 0, v, pix1, pix2, stride
> >>> + movu m2, [pix2q]
> >>> + movu m1, [pix2q+strideq]
> >>> + psadbw m2, [pix1q]
> >>> + psadbw m1, [pix1q+strideq]
> >>> + paddw m2, m1
> >>> +
> >>> +%rep 3
> >>> + lea pix1q, [pix1q+strideq*2]
> >>> + lea pix2q, [pix2q+strideq*2]
> >>> + movu m0, [pix2q]
> >>> + movu m1, [pix2q+strideq]
> >>> + psadbw m0, [pix1q]
> >>> + psadbw m1, [pix1q+strideq]
> >>> + paddw m2, m0
> >>> + paddw m2, m1
> >>> +%endrep
> >>> + movd eax, m2
> >>> + RET
> >>> +
> >>
> >> Sorry to notice that now but... what happened to the h parameter?
> >
> > i had missed that when reviewing
> >
> > fixed
>
> It's not needed. I purposely removed it and made it a fixed %rep since it's supposedly
> guaranteed to be 8.
> Check the inline version it replaced.
hmm, we need a 8x4 for interlaced chroma motion estimation
but maybe we just dont support interlaced chroma ME, i dont remember
still i think its better if our code can handle that case so support
for interlaced chroma ME ca be added without needing to update the asm
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140917/aeb41f95/attachment.asc>
More information about the ffmpeg-devel
mailing list