[FFmpeg-devel] LIBMPEG2_BITSTREAM_READER vs. golomb.h

Mon Jul 14 03:13:03 CEST 2008

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> On Monday 14 July 2008, M?ns Rullg?rd wrote:
> [...]
>> >> This is all annoying because LIBMPEG2_BITSTREAM_READER is slightly
>> >> faster on ARM.
>> >
>> > What about just using ALT_BITSTREAM_READER for ARMv6 and newer (cores
>> > that support unaligned memory accesses)?
>>
>> I tried enabling HAVE_FAST_UNALIGNED, and it didn't make any
>> significant difference.
>>
>> > It could be the fastest bitstream reader when implementing unaligned
>> > 32-bit bigendian load as:
>> >
>> > setend be
>> > ldr ...
>> > setend le
>>
>> ldr; rev is only two instructions.
>
> But it's 6 cycles on ARM11. Because unaligned read has 4 cycles
> latency, and rev instruction has its argument as 'early reg' (+1
> more cycle penalty).  Sequence "ldr"+"rev" is a dependency chain and
> you can't do much about it, it's a bad choice.

That's assuming you can't schedule anything between ldr and rev.  In
most cases, this will be possible.

> On the other hand, "setend be"/"ldr"/"setend le" sequence is 3
> cycles, with some latency for load result availability.

I'm not following your maths.  You said just above that unaligned ldr
has 4 cycles latency.

> In the worst case it is 5 cycles, which is already better than what
> you suggest. And you still have some freedom reordering instructions
> for getting better results.

I disagree.  Look at the sequences side by side:

1   setend      ldr
2   ldr
3   setend
4
5               rev
6   use         use

Both take 6 cycles, the version with rev leaving 3 cycles free for
other instructions, while the setend version only has 2 spare cycles.

It is probably possible to find situations where either solution is
faster.  Things like this are never that clear-cut.

-- 
M?ns Rullg?rd
mans at mansr.com