[FFmpeg-devel] [PATCH] ARM: NEON optimised simple_idct

Måns Rullgård mans
Mon Aug 25 23:10:03 CEST 2008


Michael Niedermayer <michaelni at gmx.at> writes:

> tOn Mon, Aug 25, 2008 at 09:04:27PM +0100, M?ns Rullg?rd wrote:
>> Michael Niedermayer <michaelni at gmx.at> writes:
>> 
>> > On Mon, Aug 25, 2008 at 07:47:16PM +0100, M?ns Rullg?rd wrote:
>> >> Michael Niedermayer <michaelni at gmx.at> writes:
>> > [...]
>> >> >2. depending on the pattern of non zero / all zero rows one of 8
>> >> > optimized column transforms is used.  This may be a bad idea though
>> >> > for a CPU with a small code cache ...
>> >> >
>> >> > also maybe it would make sense to look at i386/idct_sse2_xvid.c
>> >> > which uses SSE2 (128bit registers), this one uses only 16bit operations
>> >> > for the column transform so it may be faster when the tricks of the simple
>> >> > idct arent applicable
>> >> 
>> >> Do you expect any sane person to be able to read that?  
>> >
>> > well, a little insanity may be needed
>> >
>> >> That's also
>> >> not bitexact, right?
>> >
>> > it is supposed to be bitexact, and i cannot remember a case where any
>> > input lead to different output. Also the MMX one is used in the
>> > regression tests and they match between MMX and non x86 cpus ...
>> 
>> All the different IDCT variants (int, simple, simplemmx, libmpeg2mmx,
>> xvidmmx, faani) give different output on my machine with current
>> FFmpeg.  Which one is correct?
>
> all
>
> and if you really have a case where simple and simplemmx return different
> output for the same and correctly permutated input then iam very interrested
> in that.

I see differences with most files.  Can you suggest an easy way to
extract the coefficients for the offending block?

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list