[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#6
Michael Niedermayer
michaelni
Mon Aug 27 23:25:16 CEST 2007
Hi
On Mon, Aug 27, 2007 at 10:21:40PM +0200, Balatoni Denes wrote:
> Hi!
>
> I hope I am not filtered to your trash folder yet, Michael ;)
>
> Here is a new patch, try#6. It is more accurate now (albeit a tiny bit
> slower), almost - but unfortunatelly only almost - passing ieee-1180.
> Here is dct-test output:
>
> 144 101 394 286 421 176 11 126
> 88 109 152 79 99 93 166 94
> 233 140 94 134 83 131 57 74
> 185 100 121 74 85 117 84 70
> 207 134 74 129 129 89 66 88
> 112 74 74 94 75 80 124 103
> 126 144 130 111 120 82 87 104
> 96 119 72 83 66 133 146 97
> IDCT SIMPLE-VIS: err_inf=1 err2=0.02248828 syserr=0.02105000 maxout=260
> blockSumErr=8
i suspect that this can be improved by slightly changing
some coefficients or bias ...
>
> Btw, I tested Walken's idct, here is the dct-test output:
>
> 92 156 -255 132 -189 129 -9 110
> -242 90 -97 100 -105 84 -104 80
> -89 157 -141 66 -165 107 -108 89
> -142 116 -92 81 -66 107 -102 67
> -163 128 -157 87 -92 59 -104 50
> -93 142 -78 123 -60 111 -107 109
> -149 148 -49 98 -43 64 -87 82
> -165 115 -115 66 -115 86 -112 96
> IDCT WALKEN-VIS: err_inf=1 err2=0.02248906 syserr=0.01275000 maxout=260
> blockSumErr=8
i assume thats just the c code? if its asm id like to see a benchmark ...
[...]
> 1.)
> > > ok, but then you should move the for up so its not immedeatly before
> > > a fcmpd using its result
> >
> > Ok, done.
>
> Well, I moved them back, because it broke sparse matrices.
well you do have to change the used register of course
>
> 2.)
> > > there are 32 64bit registers these should be enough to do the idct
> > > without an intermediate store-load
> > > the whole 8x8 block needs 16registers, 7 for the constant coefficients
> > > that leaves 9 available
> >
> > It would be slower. In it's current form of the idct, there are 8
> > independent VIS instructions after each other, so the instruction latency
> > is not a problem. If you only use 9 registers, than good luck with latency.
>
> Indeed. Calculating the first column part would take at least 30 clocks more
> because of latency, because there would be only one register for intermediate
> results. Calculating the second column would take at lest 10 clocks more, and
> by this time we are slower than before, as the gain from all this wourk would
> have been about 32 clocks.
well your current code mixes the even and odd calculations thus it would
require twice as many intermediates, a proper implementation would not
and thus would only need 4 registers to accumulate values until the butterfly
also 1 register would become available after each column thus
0. column 9 registers available
2. column 6 registers available
4. column 7 registers available
6. column 8 registers available
1. column 9 registers available
3. column 6 registers available
5. column 7 registers available
7. column 8 registers available
>
> 3.)
> > > the idct should not store the output in memory but leave it in registers
> > > the ff_simple_idct_put/add then should call the idct (or have it inlined)
> > > and the clamping code should just work with the registers
> > > this avoids another 32 instructions
>
> Although it could be done, it is quite some work (and as always, relatively
> little benefit), and more complexity in the code. It's really not worth it,
> though we might not agree on this point.
>
>
> So this was my last attempt at trying to get this code into SVN. If doesn't
> get in now, than let's be realistic Michael, it never will - because there
> really are very few people interested in developping SPARC VIS assembly -
> like there was no original VIS code in ffmpeg before I came, only parts
> copypasted from libmpeg2, and most of the things are just not optimized for
> SPARC.
>
> I do believe this contribution would be beneficial to ffmpeg, because the C
> idct is much slower, and the mlib idct sometimes makes the picture turn pink
> (or causes other artifacts).
>
> Anyhow, do as you wish, I am off to have dinner
this decission is easy, patch rejected
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070827/60707d04/attachment.pgp>
More information about the ffmpeg-devel
mailing list