[FFmpeg-devel] [PATCH][RFC] Lagarith Decoder.
Måns Rullgård
mans
Wed Aug 12 17:12:25 CEST 2009
Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:
> On Wed, Aug 12, 2009 at 03:41:01PM +0100, M?ns Rullg?rd wrote:
>> Reimar D?ffinger <Reimar.Doeffinger at gmx.de> writes:
>>
>> > On Wed, Aug 12, 2009 at 02:12:55PM +0200, Michael Niedermayer wrote:
>> >> On Mon, Aug 10, 2009 at 11:42:19PM -0600, Nathan Caldwell wrote:
>> >> > On Sat, Aug 8, 2009 at 6:32 AM, Michael Niedermayer<michaelni at gmx.at> wrote:
>> >> > >> +/* Fast round up to least power of 2 >= to x */
>> >> > >> +static inline uint32_t clp2(uint32_t x)
>> >> > >> +{
>> >> > >> + x--;
>> >> > >> + x |= (x >> 1);
>> >> > >> + x |= (x >> 2);
>> >> > >> + x |= (x >> 4);
>> >> > >> + x |= (x >> 8);
>> >> > >> + x |= (x >> 16);
>> >> > >> + return x+1;
>> >> > >> +}
>> >> > >
>> >> > > is 1<<av_log2(x) faster?
>> >> >
>> >> > Might be, but it gives different results, so it's a moot point.
>> >>
>> >> 2<<av_log2(x-1)
>> >> or whatever
>> >
>> > Well, that all depends on what input range is needed.
>> > E.g. for 0 the documentation does not match the behaviour
>> > for the original function (returns 0 which is not even a
>> > power of 2).
>> > In the worst case, you'd have to do
>> > return x > 1 ? 2 << av_log(x - 1) : x;
>> > I think, which has a small but still existing chance of
>> > being faster.
>>
>> That's still easy to optimise, at least for ARM:
>>
>> subs r1, r0, #1
>> clz r1, r1
>> movgt r0, #2
>> rsb r1, r1, #31
>> lslgt r0, r0, r1
>>
>> This should be about twice as fast as the shift/or version.
>
> Well, you still have to teach the compiler at least to use clz for
> av_log2, I think you haven't yet ;-)
I can't because it's in common.h, which is installed. We really
should find a way to fix that.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list