[Ffmpeg-devel] Re: fastmemcpy in ffmpeg
Gunnar von Boehn
gunnar
Tue Sep 26 12:30:10 CEST 2006
Hi,
Rich Felker wrote:
>>>Just by
>>>adding one prefetch instruction to the normal Linux memcpy you can speed
>>>it up a lot 50%.
>>
>>[..]
>>
>>So why don't you submit such work for inclusion in glibc? That way,
>>everybody profits!
>
>
> No, multimedia apps profit and everyone else loses. fastmemcpy is
> several times slower for tiny copies, which are the only thing that
> _normal_ apps ever do. The only type of memcpy that belongs in libc is
> the ultra-trivial implementation which (on the i386 family) happens to
> also be the fastest implementation that works on all cpu generations.
> Anything like fastmemcpy requires either cpu-specific libc or runtime
> cpudetect, the former of which is probably not acceptable for most
> users and the latter of which will be horribly slow for the common
> cases...
I have to disagree, politely.
- A CPU optimized version will easely be faster
than the normal version for sizes higher than 64/128 byte.
- An optimized version will be about twice as fast
for sizes higher than 500 byte / 1KB.
- The added overhead for all memcpy is just one " if( size>128 ){ "
If you tune this branch that it defaults (falls through)
to the smaller size routine then you can get this "if"
for 1 clock or less on many CPUs. The overhead for this is totally
neglectable.
Please mind that the ultra trivial implementation is only
the fastest implementation for CPUs without any 2nd level cache.
Its real slow for CPUs with 2nd level cache.
If you want to see examples for a very effeciant handling of such cases
and how to install optimized routines on runtime then please have a look
at the source of MAC OS X.
I think we should not go into this here as its getting off-topic.
Cheers
Gunnar
More information about the ffmpeg-devel
mailing list