[FFmpeg-devel] [rfc] qualification task: SSE2 IDCT
Pascal Massimino
pascal.massimino
Wed Apr 2 11:01:18 CEST 2008
On Sun, Mar 30, 2008 at 4:53 PM, Michael Niedermayer <michaelni at gmx.at>
wrote:
> On Sun, Mar 30, 2008 at 02:25:15PM +0100, Balatoni Denes wrote:
> [...]
> > Also if Alexander gets skal to donate his code under LGPL, will it
> satisfy the
> > qualification task requirement, as skal's idct iirc is in fact very
> similar
> > to app note 922/945 ? :)
>
> Is skals idct faster/slower/same speed/unknown relative to ap* ?
>
> My original idea for this qualification was to have a AP922/945 SSE2 idct
> which combines all optimizations from all existing such IDCTs. So the
> question about any single one being ok is not awnserable. The code has to
> be compared to see if there are any further improvments possible.
> Also the output must be binary identical to an existing IDCT
> (to minimize the issues with idct drift between the ever growing number
> of idcts)
> Its a little mystery to me why alex apparently thought this was an easy
> task.
>
> Now iam perfectly fine with a simple SSE2 idct one. This would at least
> skip the binary identical output problem and the work for me comparing
> it against other IDCTs. OTOH its harder as there is no existing SSE2 code
> to base ones work on ...
>
> Theres also AMD who have promissed to implement 2 things for us,
> they are (since months) working on a SSE float aan dct. I think they might
> be happy if the second task would be a AP945/922 SSE2 IDCT as they already
> have some code for that.
i think it's important to not introduce a "new" idct with a different
error-landscape than the ones already around (even if IEEE-1180
compliant). We already have the famous Walken-idct and the
simple-idct. A new one would cause another round of idct-mismatch
problem (that's why there's only my fdct in xvid, for instance, and not
the idct). This being said, so far i recall, you can turn the
skl_dct_sse.asm
into a Walken-exact (bitwise) idct by using the following rounding
constants as replacement:
Idct_Rnd0: dd 65536, 65536
Idct_Rnd1: dd 3597, 3597
Idct_Rnd2: dd 2260, 2260
Idct_Rnd3: dd 1203, 1203
Idct_Rnd4: dd 0, 0
Idct_Rnd5: dd 120, 120
Idct_Rnd6: dd 512, 512
Idct_Rnd7: dd 512, 512
and of course:
Idct_Sparse_Rnd0: times 4 dw (65536>>11)
Idct_Sparse_Rnd1: times 4 dw ( 3597>>11)
Idct_Sparse_Rnd2: times 4 dw ( 2260>>11)
skal
More information about the ffmpeg-devel
mailing list