[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#6
Balatoni Denes
dbalatoni
Mon Aug 27 22:21:40 CEST 2007
Hi!
I hope I am not filtered to your trash folder yet, Michael ;)
Here is a new patch, try#6. It is more accurate now (albeit a tiny bit
slower), almost - but unfortunatelly only almost - passing ieee-1180.
Here is dct-test output:
144 101 394 286 421 176 11 126
88 109 152 79 99 93 166 94
233 140 94 134 83 131 57 74
185 100 121 74 85 117 84 70
207 134 74 129 129 89 66 88
112 74 74 94 75 80 124 103
126 144 130 111 120 82 87 104
96 119 72 83 66 133 146 97
IDCT SIMPLE-VIS: err_inf=1 err2=0.02248828 syserr=0.02105000 maxout=260
blockSumErr=8
Btw, I tested Walken's idct, here is the dct-test output:
92 156 -255 132 -189 129 -9 110
-242 90 -97 100 -105 84 -104 80
-89 157 -141 66 -165 107 -108 89
-142 116 -92 81 -66 107 -102 67
-163 128 -157 87 -92 59 -104 50
-93 142 -78 123 -60 111 -107 109
-149 148 -49 98 -43 64 -87 82
-165 115 -115 66 -115 86 -112 96
IDCT WALKEN-VIS: err_inf=1 err2=0.02248906 syserr=0.01275000 maxout=260
blockSumErr=8
So it is a bit more accurate, and indeed it kind of passes ieee-1180.
There were three outstanding issues:
1.)
> > ok, but then you should move the for up so its not immedeatly before
> > a fcmpd using its result
>
> Ok, done.
Well, I moved them back, because it broke sparse matrices.
2.)
> > there are 32 64bit registers these should be enough to do the idct
> > without an intermediate store-load
> > the whole 8x8 block needs 16registers, 7 for the constant coefficients
> > that leaves 9 available
>
> It would be slower. In it's current form of the idct, there are 8
> independent VIS instructions after each other, so the instruction latency
> is not a problem. If you only use 9 registers, than good luck with latency.
Indeed. Calculating the first column part would take at least 30 clocks more
because of latency, because there would be only one register for intermediate
results. Calculating the second column would take at lest 10 clocks more, and
by this time we are slower than before, as the gain from all this wourk would
have been about 32 clocks.
3.)
> > the idct should not store the output in memory but leave it in registers
> > the ff_simple_idct_put/add then should call the idct (or have it inlined)
> > and the clamping code should just work with the registers
> > this avoids another 32 instructions
Although it could be done, it is quite some work (and as always, relatively
little benefit), and more complexity in the code. It's really not worth it,
though we might not agree on this point.
So this was my last attempt at trying to get this code into SVN. If doesn't
get in now, than let's be realistic Michael, it never will - because there
really are very few people interested in developping SPARC VIS assembly -
like there was no original VIS code in ffmpeg before I came, only parts
copypasted from libmpeg2, and most of the things are just not optimized for
SPARC.
I do believe this contribution would be beneficial to ffmpeg, because the C
idct is much slower, and the mlib idct sometimes makes the picture turn pink
(or causes other artifacts).
Anyhow, do as you wish, I am off to have dinner
bye
Denes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple_idct_vis_try6.diff
Type: text/x-diff
Size: 22586 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070827/2a0c15b1/attachment.diff>
More information about the ffmpeg-devel
mailing list