[FFmpeg-devel] [PATCH/RFC] intreadwrite.h rewrite
Måns Rullgård
mans
Mon Apr 6 12:59:25 CEST 2009
Luca Barbato <lu_zero at gentoo.org> writes:
> M?ns Rullg?rd wrote:
>> I would like to propose a rework of intreadwrite.h. This new version
>> supports per-arch implementations of the various macros allowing us to
>> take advantage of special instructions or other properties the
>> compiler does not know about.
>> ARMv6 and later support unaligned loads and stores for single
>> word/halfword but not double/multiple. GCC is ignorant of this and
>> will always use bytewise accesses for unaligned data. Casting to an
>> int32_t pointer is dangerous since a load/store double or multiple
>> instruction might be used (this happens with some code in FFmpeg).
>> Implementing the AV_[RW]* macros with inline asm using only supported
>> instructions gives fast and safe unaligned accesses. This gives an
>> overall speedup of up to 10% in some cases.
>> PPC is normally big endian but has special little endian load/store
>> instructions. Using these avoids a separate byteswap. This makes the
>> vorbis decoder about 5% faster. Not much else uses little-endian
>> read/write extensively. GCC generates horrible PPC code for the
>> default AV_[RW]B64 (which uses a packed struct), so I have overridden
>> it with a plain pointer cast.
>> For other architectures the definitions of these macros should remain
>> unchanged.
>> I'm attaching the complete files instead of a diff since the diff
>> since this is largely a rewrite. Sorry for attaching three files with
>> same name; I'm sure you can work out which is which.
>
> Nice, I assume HAVE_LDBRX comes from a configure check, isn't it?
Yes, it will be set in configure. The stupid gnu assembler lets it
through no matter what CPU is selected though, so I have to enable it
manually for those that have it. Does any CPU other than the Cell
have this instruction?
--
M?ns Rullg?rd
mans at mansr.com
More information about the ffmpeg-devel
mailing list