[FFmpeg-devel] [PATCH] Faster CABAC H.264 residual decoding

Sun Apr 27 13:24:49 CEST 2008

Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> On Sunday 27 April 2008, M?ns Rullg?rd wrote:
>> matthieu castet <castet.matthieu at free.fr> writes:
>> > Jason Garrett-Glaser wrote:
>> >> On the advice of #ffmpeg-devel I have made a version with uint8_t
>> >> arrays instead of int.
>> >
>> > Don't forget that some cpu (arm for example) don't have native 8 bits
>> > operation. Everything is done in 32 bits, and 8 bits behavior is
>> > emulated with extra operation.
>>
>> ARM has byte load and store instructions.  All ALU operations are
>> 32-bit, except for certain multiplies.  I doubt this is a problem
>> here.
>>
>> The only recent CPU I know of that lacks byte load/store is the first
>> generation of the Alpha.
>
> Probably he just wanted to say that reading bytes has higher latency 
> (+1 cycle extra) than reading ints on at least some ARM cores (ARM9).

Where do you find this information?  The ARM926 data sheet only
mentions the 1-cycle penalty for shifted offsets.

> On the other hand, indexing bytes in array does not require shifted 
> offset (which may also introduce some kind of penalty).

A left shift by 2 has no penalty on ARMv6.

-- 
M?ns Rullg?rd
mans at mansr.com