[FFmpeg-devel] [PATCH] Port extra x264 CPU detection code
Jason Garrett-Glaser
darkshikari
Sat Jan 10 04:11:39 CET 2009
On Fri, Jan 9, 2009 at 10:05 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Fri, Jan 9, 2009 at 9:21 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Wed, Jan 07, 2009 at 11:46:59AM -0500, Jason Garrett-Glaser wrote:
>>> + if( !strcmp((char*)vendor, "GenuineIntel") ){
>>> + int family, model, stepping;
>>> + family = ((eax>>8)&0xf) + ((eax>>20)&0xff);
>>> + model = ((eax>>4)&0xf) + ((eax>>12)&0xf0);
>>> + stepping = eax&0xf;
>>> + /* 6/9 (pentium-m "banias"), 6/13 (pentium-m "dothan"), and 6/14 (core1 "yonah")
>>> + * theoretically support sse2, but it's significantly slower than mmx for
>>> + * basically all functions, so let's just pretend they don't. */
>>> + if( family==6 && (model==9 || model==13 || model==14) ){
>>> + rval &= ~FF_MM_SSE2;
>>> + assert(!(rval&FF_MM_SSSE3));
>>> + }
>>> + }
>>> }
>>>
>>> cpuid(0x80000000, max_ext_level, ebx, ecx, edx);
>>
>> i am not entirly happy about lying about the supported feature set.
>> Though iam not rejecting this, rather i abstain from approving it,
>> if the others think this is ok so am i with it if not then not.
>
> Can't you just introduce a flag "SSE2_IS_ACTUALLY_FASTER_THAN_MMX" or
> the opposite and use an if() to mark functions that are slower on
> these particular CPUs with SSE2 than MMX as such and therefore get the
> best of both worlds, including support for the ~1 SSE2 function where
> SSE2 does actually beat MMX on these CPUs?
>
> Just a silly suggestion.
I don't feel like it justifies the addition of so much code to deal
with crappy CPUs.
x264 does have a system like "SSE2_IS_SLOW" and "SSE2_IS_FAST", for
Athlon64-specific and Core2/Nehalem/Phenom-specific stuff, but adding
"SSE2_IS_ACTUALLY_FASTER_THAN_MMX" or the opposite would require
adding such flags to every single CPU initialization for every single
block of SSE2 functions, while SSE2_IS_SLOW and SSE2_IS_FAST only have
to be added in a couple places.
In other words, if you have 50 cases, 49 of which have "X" and 1 of
which has "Y", you don't want to adjust the 49 "X" cases to add a new
flag--you want to adjust the one Y case, at most, or just say "screw
it, we're not bothering for 5 clocks on one function on a few crappy
CPUs, we're just going to disable it."
By the way, the one function which was like ~5 clocks faster was sum
of squared differences. I have no idea why--could have been code
alignment or just scheduling of instructions or something.
To Michael:
The spaces were because I copied that line from x264. I'll fix that
in the next patch posting.
Dark Shikari
More information about the ffmpeg-devel
mailing list