[FFmpeg-devel] [PATCH] Port extra x264 CPU detection code

Sat Jan 10 04:11:39 CET 2009

On Fri, Jan 9, 2009 at 10:05 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Fri, Jan 9, 2009 at 9:21 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> On Wed, Jan 07, 2009 at 11:46:59AM -0500, Jason Garrett-Glaser wrote:
>>> +        if( !strcmp((char*)vendor, "GenuineIntel") ){
>>> +            int family, model, stepping;
>>> +            family = ((eax>>8)&0xf) + ((eax>>20)&0xff);
>>> +            model  = ((eax>>4)&0xf) + ((eax>>12)&0xf0);
>>> +            stepping = eax&0xf;
>>> +            /* 6/9 (pentium-m "banias"), 6/13 (pentium-m "dothan"), and 6/14 (core1 "yonah")
>>> +             * theoretically support sse2, but it's significantly slower than mmx for
>>> +             * basically all functions, so let's just pretend they don't. */
>>> +            if( family==6 && (model==9 || model==13 || model==14) ){
>>> +                rval &= ~FF_MM_SSE2;
>>> +                assert(!(rval&FF_MM_SSSE3));
>>> +            }
>>> +        }
>>>      }
>>>
>>>      cpuid(0x80000000, max_ext_level, ebx, ecx, edx);
>>
>> i am not entirly happy about lying about the supported feature set.
>> Though iam not rejecting this, rather i abstain from approving it,
>> if the others think this is ok so am i with it if not then not.
>
> Can't you just introduce a flag "SSE2_IS_ACTUALLY_FASTER_THAN_MMX" or
> the opposite and use an if() to mark functions that are slower on
> these particular CPUs with SSE2 than MMX as such and therefore get the
> best of both worlds, including support for the ~1 SSE2 function where
> SSE2 does actually beat MMX on these CPUs?
>
> Just a silly suggestion.

I don't feel like it justifies the addition of so much code to deal
with crappy CPUs.

x264 does have a system like "SSE2_IS_SLOW" and "SSE2_IS_FAST", for
Athlon64-specific and Core2/Nehalem/Phenom-specific stuff, but adding
"SSE2_IS_ACTUALLY_FASTER_THAN_MMX" or the opposite would require
adding such flags to every single CPU initialization for every single
block of SSE2 functions, while SSE2_IS_SLOW and SSE2_IS_FAST only have
to be added in a couple places.

In other words, if you have 50 cases, 49 of which have "X" and 1 of
which has "Y", you don't want to adjust the 49 "X" cases to add a new
flag--you want to adjust the one Y case, at most, or just say "screw
it, we're not bothering for 5 clocks on one function on a few crappy
CPUs, we're just going to disable it."

By the way, the one function which was like ~5 clocks faster was sum
of squared differences.  I have no idea why--could have been code
alignment or just scheduling of instructions or something.

To Michael:

The spaces were because I copied that line from x264.  I'll fix that
in the next patch posting.

Dark Shikari