[FFmpeg-devel] [PATCH][RFC] Lagarith Decoder.

Wed Aug 12 17:12:25 CEST 2009

Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:

> On Wed, Aug 12, 2009 at 03:41:01PM +0100, M?ns Rullg?rd wrote:
>> Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:
>> 
>> > On Wed, Aug 12, 2009 at 02:12:55PM +0200, Michael Niedermayer wrote:
>> >> On Mon, Aug 10, 2009 at 11:42:19PM -0600, Nathan Caldwell wrote:
>> >> > On Sat, Aug 8, 2009 at 6:32 AM, Michael Niedermayer<michaelni at gmx.at> wrote:
>> >> > >> +/* Fast round up to least power of 2 >= to x */
>> >> > >> +static inline uint32_t clp2(uint32_t x)
>> >> > >> +{
>> >> > >> +    x--;
>> >> > >> +    x |= (x >> 1);
>> >> > >> +    x |= (x >> 2);
>> >> > >> +    x |= (x >> 4);
>> >> > >> +    x |= (x >> 8);
>> >> > >> +    x |= (x >> 16);
>> >> > >> +    return x+1;
>> >> > >> +}
>> >> > >
>> >> > > is 1<<av_log2(x) faster?
>> >> > 
>> >> > Might be, but it gives different results, so it's a moot point.
>> >> 
>> >> 2<<av_log2(x-1)
>> >> or whatever
>> >
>> > Well, that all depends on what input range is needed.
>> > E.g. for 0 the documentation does not match the behaviour
>> > for the original function (returns 0 which is not even a
>> > power of 2).
>> > In the worst case, you'd have to do
>> > return x > 1 ? 2 << av_log(x - 1) : x;
>> > I think, which has a small but still existing chance of
>> > being faster.
>> 
>> That's still easy to optimise, at least for ARM:
>> 
>> subs  r1, r0, #1
>> clz   r1, r1
>> movgt r0, #2
>> rsb   r1, r1, #31
>> lslgt r0, r0, r1
>> 
>> This should be about twice as fast as the shift/or version.
>
> Well, you still have to teach the compiler at least to use clz for
> av_log2, I think you haven't yet ;-)

I can't because it's in common.h, which is installed.  We really
should find a way to fix that.

-- 
M?ns Rullg?rd
mans at mansr.com