[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel
Ronald S. Bultje
rsbultje at gmail.com
Fri Mar 7 12:43:07 CET 2014
Hi,
On Thu, Mar 6, 2014 at 10:40 AM, Pierre Edouard Lepere <
Pierre-Edouard.Lepere at insa-rennes.fr> wrote:
> new patch, now all in a single, smaller file !
>
Thanks!
> >> + sub srcq, 1
> >
> >Why? Just subtract one from src when you dereference from it [srcq-1]
> >instead of [srcq]).
>
> because it's more convenient, having filters start at src whether we are
> in h, v or hv.
>
Right, I understand, but we're writing assembly, this isn't exactly
convenient. I'm fine with it as a FIXME for later but at least mark it in
the code as such - it does save one instruction.
> > + EPEL_LOAD 8, src, 1
> > + EPEL_COMPUTE 8, 2
> > + PEL_STORE2 dst, m0, m1
> > + LOOP_END epel_h_h_2_8, dst, dststride, src, srcstride
> > + RET
>
> OK, so the actual code. For play, can you show the _actual disassembly_
> that all these macros eventually got us to? I wonder what it actually
> gives.
>
(Still hoping for this one.)
> >I can understand the pmaddwd approach for second pass may be faster for
> >half-registers, since you fill the register up to full width and save one
> >instruction - but did you measure it?
> >
> >Then, for second, you're just spending instructions shuffling. I don't
> >think 2a is faster than 2b, in fact I expect it to be significantly
> slower.
>
> This was done first with intrinsics, and pmulhw was needed, so it adds
> just too much instructions.
Oh intermediates aren't downshifted at all I guess - that sucks. OK fine
then I guess.
Ronald
More information about the ffmpeg-devel
mailing list