[FFmpeg-devel] LIBMPEG2_BITSTREAM_READER vs. golomb.h
Måns Rullgård
mans
Mon Jul 14 03:13:03 CEST 2008
Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> On Monday 14 July 2008, M?ns Rullg?rd wrote:
> [...]
>> >> This is all annoying because LIBMPEG2_BITSTREAM_READER is slightly
>> >> faster on ARM.
>> >
>> > What about just using ALT_BITSTREAM_READER for ARMv6 and newer (cores
>> > that support unaligned memory accesses)?
>>
>> I tried enabling HAVE_FAST_UNALIGNED, and it didn't make any
>> significant difference.
>>
>> > It could be the fastest bitstream reader when implementing unaligned
>> > 32-bit bigendian load as:
>> >
>> > setend be
>> > ldr ...
>> > setend le
>>
>> ldr; rev is only two instructions.
>
> But it's 6 cycles on ARM11. Because unaligned read has 4 cycles
> latency, and rev instruction has its argument as 'early reg' (+1
> more cycle penalty). Sequence "ldr"+"rev" is a dependency chain and
> you can't do much about it, it's a bad choice.
That's assuming you can't schedule anything between ldr and rev. In
most cases, this will be possible.
> On the other hand, "setend be"/"ldr"/"setend le" sequence is 3
> cycles, with some latency for load result availability.
I'm not following your maths. You said just above that unaligned ldr
has 4 cycles latency.
> In the worst case it is 5 cycles, which is already better than what
> you suggest. And you still have some freedom reordering instructions
> for getting better results.
I disagree. Look at the sequences side by side:
1 setend ldr
2 ldr
3 setend
4
5 rev
6 use use
Both take 6 cycles, the version with rev leaving 3 cycles free for
other instructions, while the setend version only has 2 spare cycles.
It is probably possible to find situations where either solution is
faster. Things like this are never that clear-cut.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list