[FFmpeg-cvslog] r16207 - trunk/libavcodec/h264.c
    compn 
    tempn
       
    Thu Dec 18 18:05:55 CET 2008
    
    
  
On Thu, 18 Dec 2008 16:07:50 -0000 (UTC), M?ns Rullg?rd wrote:
>
>compn wrote:
>> On Thu, 18 Dec 2008 11:48:58 -0000 (UTC), M?ns Rullg?rd wrote:
>>>
>>>Michael Niedermayer wrote:
>>>> On Thu, Dec 18, 2008 at 04:27:34AM +0000, M?ns Rullg?rd wrote:
>>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>>
>>>>> > On Thu, Dec 18, 2008 at 03:57:54AM +0000, M?ns Rullg?rd wrote:
>>>>> >> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>> >>
>>>>> >> > On Thu, Dec 18, 2008 at 02:57:17AM +0000, M?ns Rullg?rd wrote:
>>>>> >> >> michael <subversion at mplayerhq.hu> writes:
>>>>> >> >>
>>>>> >> >> > Author: michael
>>>>> >> >> > Date: Thu Dec 18 03:53:18 2008
>>>>> >> >> > New Revision: 16207
>>>>> >> >> >
>>>>> >> >> > Log:
>>>>> >> >> > Use the new idct functions (except chroma as it was slower in
>>>>> >> >> > benchmarks) cathedral +0.5% speed
>>>>> >> >> > aladin +0.6% speed [note aladin has been cat-ed 10 times to reduce
>>>>> >> >> > the influence of init time]
>>>>> >> >> > Speedup also verified via START/STOP_TIMER (difference was very
>>>>> >> >> > significant for the changed parts)
>>>>> >> >>
>>>>> >> >> How much does this hurt on architectures that don't yet have the new
>>>>> >> >> SIMD functions?
>>>>> >> >
>>>>> >> > there are no really new SIMD functions.
>>>>> >> > I just moved the loops like
>>>>> >> > for(i=0; i<16; i++)
>>>>> >> >     dsp->idct4x4_add(blah blah);
>>>>> >> >
>>>>> >> > into dsputil so they are
>>>>> >> >
>>>>> >> > for(i=0; i<16; i++)
>>>>> >> >     idct4x4_add_simdwhatever(blah blah);
>>>>> >> >
>>>>> >> > that way gcc can inline the function and avoids up to 15 calls
>>>>> >> > through dsp->
>>>>> >> >
>>>>> >> > adding support for this to your favorite architecture is a matter of
>>>>> >> > copy & paste and adjusting the function names.
>>>>> >>
>>>>> >> I can see how it can be done.  I'm asking how much of an impact this
>>>>> >> has on performance until it's been done.  What percentage of the old
>>>>> >> calls are affected?
>>>>> >
>>>>> > depends on the video, if i have to guess and thats just a guess id say
>>>>> > more than 50% of the idct work will go to new code
>>>>>
>>>>> That's enough to give a significant slowdown on ARM.  How about a
>>>>> little coordination next time?
>>>>
>>>> Iam not sure what you suggest?
>>>>
>>>> I could of course send out a warning liks
>>>> "will in an hour (if it passes tests and benchmarks) commit code that will
>>>>  require arch specific optims to be updated, until that update they will
>>>>  be slower. I will update x86 myself"
>>>>
>>>> but i dont think that would have helped you.
>>>
>>>That is more or less what you did, and it did not help me at all.
>>>What I want is that before you commit the change, you send a patch
>>>to the list and wait one day for arch maintainers to catch up, then
>>>apply.
>>
>> what it sounds like to me:
>>
>> "i want you to halt your work for one day, every time you optimize x86.
>> until we test it for speed regressions on other arch"
>
>That is not at all what I said.  Did even pay attention to the change?
>Michael made a change to some SIMD functions, *disabling* them entirely
>on non-x86 until they are updated to a new interface.
ah sorry, i read the commit message but didnt read the diff. that
change swapped idct, but the other arch didnt have the new function.
got it.
please ignore my stupid comments!
-compn
    
    
More information about the ffmpeg-cvslog
mailing list