[Ffmpeg-devel] [PATCH] put_mpeg4_qpel16_h_lowpass altivec, take 2
Guillaume POIRIER
poirierg
Sun Nov 26 17:38:31 CET 2006
Hi,
On 11/26/06, Brian Foley <bfoley at compsoc.nuigalway.ie> wrote:
> On Mon, Nov 20, 2006 at 02:43:17AM +0100, Michael Niedermayer wrote:
> > > + for(i=0; i<h; i++) {
> > > + src1v = vec_perm(vec_ld(0, src1), vec_ld(15, src1), vec_lvsl(0, src1));
> > > + src2v = vec_perm(vec_ld(0, src2), vec_ld(15, src2), vec_lvsl(0, src2));
> >
> > one of the 2 is in many cases aligned
>
> I'm not really sure the best way to handle this. I could have an aligned
> load in an 'if (((int) src & 0xf) == 0)', but I suspect the branch would
> hurt us quite badly. The other approach is to have 3 other functions
> where we assert src1 or src2 or both (and their strides) are aligned.
Branches that can be well predicted are very cheap on modern archs. I
don't remember which of the 7450 or the 970 offers zero penalty in
case of a well predicted branch, but at least I know that on P4, a
well predictable branch costs nothing at all (quite impressive for an
arch that so many ppl don't like here ;-)
In any case, since PPC has several predicates and a dedicated branch
unit, their cost tend to be quite better than average cpus.
Guillaume
--
An association of men who will not quarrel with one another is a thing
which has never yet existed, from the greatest confederacy of nations
down to a town meeting or a vestry.
-- Thomas Jefferson
(when interviewed about MPlayer ML flamewars)
http://www.brainyquote.com/quotes/quotes/t/thomasjeff157207.html
More information about the ffmpeg-devel
mailing list