[Ffmpeg-devel] Re: [PATCH] Machine endian bytestream functions

Fri Apr 13 23:41:07 CEST 2007

Hello,

Ramiro Polla wrote:
> Hello,
>
> Michael Niedermayer escreveu:
>> Hi
>>
>> On Sat, Mar 10, 2007 at 05:15:44PM -0300, Ramiro Polla wrote:
>>  
>>> Hello,
>>>
>>> Reimar D?ffinger escreveu:
>>>    
>>>> Hello,
>>>> On Sat, Mar 10, 2007 at 11:06:41PM -0300, ramiro at lisha.ufsc.br wrote:
>>>>  
>>>>      
>>>>> Attached patch makes the AV_{R,W}{L,B}xx macros have a machine 
>>>>> endian for
>>>>> the simple 16 and 32 bit types. Those macros are then #ifdef'd for 
>>>>> the
>>>>> correct endianess. 24 bit remains the same, as it would be more 
>>>>> complex.
>>>>>            
>>>> They completely ignore alignment issues...
>>>>
>>>>  
>>>>       
>>> You're right.
>>>
>>> Attached patch makes use of machine endianess where unaligned data 
>>> accesses are possible, and faster than what gcc is currently doing.
>>>
>>> I have only tested this on a p4, but the following program should 
>>> detect this on any architecture. Compile bytes.c and main.c with the 
>>> same options FFmpeg gives to libavcodec files, link them, and test 
>>> the speed both for patched and unpatched FFmpeg. bytes.c should be 
>>> changed to 'be' on big-endian architectures.
>>>
>>> Regression tests pass.
>>>
>>> Ramiro Polla
>>>     
>>
>>  
>>> Index: configure
>>> ===================================================================
>>> --- configure    (revis?o 8316)
>>> +++ configure    (c?pia de trabalho)
>>> @@ -602,6 +602,7 @@
>>>      dlopen
>>>      fast_64bit
>>>      fast_cmov
>>> +    fast_unaligned
>>>      freetype2
>>>      imlib2
>>>      inet_aton
>>> @@ -737,6 +738,7 @@
>>>  mmx="default"
>>>  cmov="no"
>>>  fast_cmov="no"
>>> +fast_unaligned="no"
>>>  armv5te="default"
>>>  armv6="default"
>>>  iwmmxt="default"
>>> @@ -951,9 +953,11 @@
>>>  case "$arch" in
>>>    i386|i486|i586|i686|i86pc|BePC)
>>>      arch="x86_32"
>>> +    enable fast_unaligned
>>>    ;;
>>>    x86_64|amd64)
>>>      arch="x86_32"
>>> +    enable fast_unaligned
>>>      canon_arch="`$cc -dumpmachine | sed -e 's,\([^-]*\)-.*,\1,'`"
>>>      if [ x"$canon_arch" = x"x86_64" -o x"$canon_arch" = x"amd64" ]; 
>>> then
>>>        if [ -z "`echo $CFLAGS | grep -- -m32`"  ]; then
>>>     
>>
>> maybe configure should rather have a generic test which checks which 
>> version
>> is faster? (it would be much easier to maintain instead of keeping 
>> track what
>> is faster for which cpu ...)
>>
>>
>>   
>
> Sorry, but I failed to find a simple way for this in configure. Three 
> issues came up:
> 1. Unaligned data accesses will crash on some processors, and I don't 
> think it's a good idea to have configure throw exceptions. (e.g. it 
> would open the "Send report" dialog on Windows).
> 2. Checking the speed for an x ammount of time would slow down configure.
> 3. Different cpu loads during the configure script would cause 
> unreliable results.
>
> So, for the moment, I'm sending the patch with the same check. (It's 
> just like the fast_64bit or fast_cmov check).
>
>
> cosmetics.diff reorders definitions from endianess to bit-depth.
> functional.diff makes special cases for fast_unaligned.
>
functional.diff changed configure to use the new check_exec_crash 
function. This should detect if unaligned data access doesn't crash, and 
if it returns the correct non-rotated values (as I read in [1]).
depends on cosmetics.diff from previous message.

Ramiro Polla
[1] http://www.arm.com/support/faqdev/1469.html

-------------- next part --------------
A non-text attachment was scrubbed...
Name: functional_3.diff
Type: text/x-patch
Size: 2704 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070413/1d832efa/attachment.bin>