[FFmpeg-devel] [rfc] qualification task: SSE2 IDCT
Alexander Strange
astrange
Mon Mar 31 01:56:46 CEST 2008
On Mar 30, 2008, at 6:49 AM, Michael Niedermayer wrote:
> On Sun, Mar 30, 2008 at 03:37:28AM -0400, Alexander Strange wrote:
>> I didn't have much time this week to do anything but school, but I've
>> written a working SSE2 adaption of simple_idct. It's not done yet,
>> since
>> it's still too slow for me to accept it, but I've run out of obvious
>> low-level optimizations with this approach and don't want to just
>> disappear.
>
> Hmm i thought you were working on a AP922/945 SSE2 IDCT (which would
> have been
> easier because theres a paper from intel which lists SSE2 code ...)
> Of course i am happy with a simple idct based one as well, though
> this is
> non trivial and i doubt a little this can reach svn within time.
Yes, it seems to be. I didn't want to submit one that was different
from simpleidct or xvid, since it might mess up someone's qpel
encoding. Skal's xvid sse2 idct is claimed* to be faster and more
accurate than ap922, anyway... for now, I'll port it over and see if I
can merge some of it into the input permutation.
>> Current times from dct-test:
>> IDCT SIMPLE-C: 3610.0 kdct/s
>> IDCT SIMPLE-MMX: 12738.6 kdct/s
>> IDCT SIMPLE-SSE2: 9086.8 kdct/s
>>
>> IDCT XVID-MMX: 6837.2 kdct/s
>> IDCT XVID-MMX2: 7819.4 kdct/s
>> IDCT XVID-SKAL-SSE2: 11803.0 kdct/s
>
> The first thing i must say, and iam not sure if you are happy to
> hear this ...
> dct_test is useless for speed testing. You have to use actual videos
> because
> the distribution of zeros and non zero elements differs.
Of course I wouldn't want to use it for benchmarking, but they
actually came out in the same order when I checked it against an MPEG4
clip. For a DVD, the xvid-* idcts are faster since the data is much
less sparse; I'll have to do some statistics on both of them.
* http://web.archive.org/web/20051102221740/http://skal.planet-d.net/coding/dct.html
> [...]
More information about the ffmpeg-devel
mailing list