[Ffmpeg-devel] [PATCH] put_mpeg4_qpel16_h_lowpass altivec, take 2

Sun Nov 26 17:38:31 CET 2006

Hi,

On 11/26/06, Brian Foley <bfoley at compsoc.nuigalway.ie> wrote:
> On Mon, Nov 20, 2006 at 02:43:17AM +0100, Michael Niedermayer wrote:

> > > +        for(i=0; i<h; i++) {
> > > +            src1v = vec_perm(vec_ld(0, src1), vec_ld(15, src1), vec_lvsl(0, src1));
> > > +            src2v = vec_perm(vec_ld(0, src2), vec_ld(15, src2), vec_lvsl(0, src2));
> >
> > one of the 2 is in many cases aligned
>
> I'm not really sure the best way to handle this. I could have an aligned
> load in an 'if (((int) src & 0xf) == 0)', but I suspect the branch would
> hurt us quite badly. The other approach is to have 3 other functions
> where we assert src1 or src2 or both (and their strides) are aligned.

Branches that can be well predicted are very cheap on modern archs. I
don't remember which of the 7450 or the 970 offers zero penalty in
case of a well predicted branch, but at least I know that on P4, a
well predictable branch costs nothing at all (quite impressive for an
arch that so many ppl don't like here ;-)

In any case, since PPC has several predicates and a dedicated branch
unit, their cost tend to be quite better than average cpus.

Guillaume
-- 
An association of men who will not quarrel with one another is a thing
which has never yet existed, from the greatest confederacy of nations
down to a town meeting or a vestry.
 -- Thomas Jefferson
(when interviewed about MPlayer ML flamewars)
http://www.brainyquote.com/quotes/quotes/t/thomasjeff157207.html