[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2
Michael Niedermayer
michaelni
Thu Aug 23 00:47:28 CEST 2007
Hi
On Wed, Aug 22, 2007 at 08:53:21PM +0200, Balatoni Denes wrote:
[...]
>
> > > Also, although it's the same algorithm as simple_idct, it might be less
> > > accurate on rare occasions, because of the primitive slow split 8+8 bit
> > > multply operation in VIS. I am hoping the idct won't overflow, too.
>
> > what happens with the regression tests if its forced to be used where the
> > normal C simple idct normally is?
> >
> > and what does dct-test.c output for the idct?
>
> I won't try the regression test, as it would be very complicated - ffmpeg
> doesn't support solaris 8 (I don't have acces to anything else), so I am
> using an older hacked version of mplayer.
ok
>
> But I did try dct-test, here is the output:
>
> ------------
> ednebal mwux119> ./dct-test -i 0
> ffmpeg DCT/IDCT test
>
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0
> IDCT REF-DBL: err_inf=0 err2=0.00000000 syserr=0.00000000 maxout=260
> blockSumErr=0
> IDCT REF-DBL: 50.2 kdct/s
>
> -4 -24 -1 -15 4 24 -29 -47
> -35 -13 21 -2 24 8 -9 19
> 30 -17 -38 7 -18 43 -28 -5
> 31 -31 -13 -9 23 5 16 11
> -16 20 -43 18 4 -11 -9 0
> -3 -6 -9 -3 -5 4 -25 11
> -43 10 -6 -7 24 22 -12 23
> 4 -23 -11 0 10 11 13 24
> IDCT INT: err_inf=1 err2=0.01606250 syserr=0.00235000 maxout=260 blockSumErr=6
> IDCT INT: 629.0 kdct/s
>
> -1 13 15 -2 30 9 10 -11
> -22 -27 9 11 15 -1 31 8
> 2 -2 4 -21 0 9 -21 -13
> 21 -17 -10 12 4 1 9 -1
> -8 4 -5 -6 20 -3 12 0
> 21 -15 9 18 -9 19 -14 3
> 1 22 -10 -6 11 -6 6 1
> 21 18 -7 -11 2 17 16 -8
> IDCT SIMPLE-C: err_inf=1 err2=0.00840078 syserr=0.00155000 maxout=260
> blockSumErr=5
> IDCT SIMPLE-C: 463.2 kdct/s
>
> 651 363 821 1416 1116 686 64 492
> 247 380 736 237 441 323 534 420
> 1094 716 359 676 442 484 371 426
> 726 296 437 302 399 380 413 411
> 762 622 357 548 476 429 350 463
> 451 386 407 382 437 386 415 391
> 384 579 421 523 482 492 403 454
> 304 455 360 399 344 407 396 375
> IDCT SIMPLE-VIS: err_inf=1 err2=0.06097266 syserr=0.07080000 maxout=259
> blockSumErr=12
> IDCT SIMPLE-VIS: 1024.6 kdct/s
>
> 2810 1725 3883 3476 1445 2977 1362 1943
> 1263 1244 3402 2061 3266 2497 2936 2741
> 3763 3555 1640 2765 1864 2615 2039 2391
> 3648 1897 2893 2304 2447 2391 2482 2613
> 1796 3132 2055 2615 2115 2426 2246 2373
> 2690 2483 2701 2527 2544 2556 2554 2520
> 1361 2738 2237 2556 2252 2520 2360 2349
> 1882 2594 2395 2605 2279 2759 2274 2578
> IDCT MLIBidct: err_inf=3 err2=0.38151094 syserr=0.19415000 maxout=260
> blockSumErr=40
> IDCT MLIBidct: 2380.8 kdct/s
> ------------------
>
> So it is less accurate than simple_idct, but definietly more accurate than the
> mlib version. I think I also can confirm that the inaccuracy is from the
> split two instruction multiply+rounding - because I have an earlier version
> of simple_idct_vis that uses 32 bit for the calculations (and hence does only
> two rows parallel, so it's half as fast, only a bit faster than the C
> version), which has the same good accuracy as simple_idct. As the main
> algorithm is the same, I think the conclusion, that the two instruction
> multiply+rounding is guilty, is correct.
ehhh, thats the first time i see the mlib accuarcy .... uhm
thats miles outside the requirements mpeg&h26x have
i wonder if mpeg4 is even watchable with that ...
also please dont enable either the mlib nor simple_vis idct by default
(at least not for encoding) as the files will show artifacts if decoded
with an accurate IDCT (= they wont be playable on non sparc)
[...]
> > also you permute the input explicitly instead of setting
> > idct_permutation_type
> > properly
> >
> > please dont sumbit trash
>
> I don't think it is trash. Again, there is no left shift. It could be
> substituted with 4 additions and no fpackfix mess, but than it would only be
> marginally faster, and there would be more code in this file (as this
> permutating is still needed for the other half of the idct), in other words it
> would be more complex for little gain. So I prefer this version.
yes, sorry ...
ill review the code later ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The educated differ from the uneducated as much as the living from the
dead. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070823/5990c256/attachment.pgp>
More information about the ffmpeg-devel
mailing list