[FFmpeg-devel] [PATCH] use AV_RB16 in cabac refill
Måns Rullgård
mans
Sat Mar 27 02:02:11 CET 2010
Alexander Strange <astrange at ithinksw.com> writes:
> 2010/3/25 M?ns Rullg?rd <mans at mansr.com>:
>> Alexander Strange <astrange at ithinksw.com> writes:
>>
>>> On Mar 25, 2010, at 4:08 AM, David Conrad wrote:
>>>
>>>> On Mar 25, 2010, at 3:30 AM, Alexander Strange wrote:
>>>>
>>>>> Measured 1 cycle faster decode_cabac_residual on x86-64. Didn't try anywhere else, but I'd be a little interested in what arm does.
>>>>
>>>> It ought to be 2 instruction less and faster. However, both llvm
>>>> and gcc decide to zero extend from 16 bits twice, and
>>>> (llvm-)gcc-4.2 decides to load bytestream twice.
>>>
>>> Hmm, zero-extending in bswap_16 isn't really surprising, since asm
>>> operands are always extended to int.
>>
>> That depends on how the asm is written.
>>
>>> The only solution there is to write AV_RB16 in asm too.
>>>
>>> --disable-asm is remarkably bad, I think it should be using
>>> (p[0] << 8 | p[1]) instead of __attribute__((packed)) and bswap_16
>>> when FAST_UNALIGNED isn't defined.
>>
>> I don't quite understand that.
>
> If I configure for arm with --disable-asm (using iPhone gcc), this:
What --cpu did you use?
> #include "libavutil/intreadwrite.h"
>
> int test(uint16_t *p);
>
> int test(uint16_t *p)
> {
> return AV_RB16(p);
> }
>
> turns into this:
>
> int test(uint16_t *p)
> {
> return bswap_16((((const union unaligned_16 *) (p))->l));
> }
>
> which compiles to:
>
> _test:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> @ link register save eliminated.
> ldrb r3, [r0, #0] @ zero_extendqisi2
> ldrb r0, [r0, #1] @ zero_extendqisi2
> @ lr needed for prologue
> orr r3, r3, r0, asl #8
> mov r0, r3, lsr #8
> orr r0, r0, r3, asl #8
> uxth r0, r0
> bx lr
This is gcc being stupid in multiple ways. A clever compiler would
turn that into an LDRH followed by a REV16. Since it used UXTH, it
must have been compiling for ARMv6 or later, so unaligned loads would
be OK. Even without unaligned loads, a decent compiler should be able
to unravel it all and eliminate the redundant shift/or operations.
> because it (apparently) always uses byte loads for packed structures.
> I don't know if anyone cares how well --disable-asm compiles, I was
> just curious.
Guess why I added the asm for those things.
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list