[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #4
Balatoni Denes
dbalatoni
Thu Aug 23 21:01:01 CEST 2007
Hi!
So here is a new patch, I implemented your suggestions. One change is, that I
used 4 fpadd16 to do the shifting left by four, because what gcc made of the
C code didn't look all that good - or maybe I misunerstood it, I don't know.
Anyhow it shouldn't be much slower, as the c version also needed 1 load, 1
store, 1 shift and 1 logical and, and also increasing the loop variable, and
checking it (although block_last_index could have made it slightly faster). I
hope it's ok.
Thursday 23 August 2007 14:00-kor Michael Niedermayer ezt ?rta:
> > HDTV). Also as the idct is rather inaccurate,
>
> ive not yet looked at how to make it more accurate :)
I am quite positive, that the 2 instruction fmul is the problem. Both halves
of the multiply do rounding, so this explains everything. And as I mentioned,
the version that used 16x16->32 bit muls had the same good accuracy as
simple_idct.
> its like leaving 100euro laying at the street saying its not enough to buy
> a car ...
> [...]
> 2% overall speedup is huge ive rejected patches which would have introduced
> new features because they slowed the code down by 0.1%
Yes, ok, I did it after all (and it didn't hurt :) ). Unfortunatelly I can't
benchmark properly because of many background processes, but dct-test says -
though it seems a bit too optimistic - there is a 20% speed-up of the idct. I
think 5-10% is more realistic and probable, but anyway there should be
measurable improvment. BTW I do think your rejecting features because they
slowed the code down by 0.1% is a bit harsh, but that's none of my
business :)
> also mlib does the idct at half the speed, so i think theres more than 5% of
> gain possible
IMO the idct is not too slow right now. But also imho mlib's speed is because
of a faster, mpeg (derived) algorithm, which uses half as many multiplies. So
with the simple_idct algorithm, I don't expect major speedups.
bye
Denes
ps: it would be great if this could be committed as is, because I already
spent far too much with this code (definietly more than a week, in fact)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simple_idct_vis_try4.diff
Type: text/x-diff
Size: 21978 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070823/7e08557a/attachment.diff>
More information about the ffmpeg-devel
mailing list