[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try #2
Balatoni Denes
dbalatoni
Thu Aug 23 02:48:41 CEST 2007
Hi!
Thursday 23 August 2007 01:29-kor Michael Niedermayer ezt ?rta:
> > In the row iteration it is not only permuted, but also shifted right four
> > bits. But there is no shift instruction. So if you know a significantly
> > faster way to shift the input right four bits, than do tell me.
>
> there is a shift instruction, sllx, wheres the problem with using that?
Well, you can't move between floating point and integer registers. So there
would be some additional storing to memory, reading from memory, some masking
is still needed, than the shift - all in all it's the same speed or slower
than 4 adds. Which I already said, that I don't really like, because of
marginal speedup, and more complexity.
> also iam realizing now that you read and work just with 32bits at a time
> while the registers really are 64bit
> so unles sparc need 2x as much time for 64bit instructions this is very
> inefficient
Now I am kind of puzzled. I am using 64 bit registers. Like f0+f1 is one 64bit
register. f32, f34, ...f62 are 64 bit registers (these can't even be accessed
in 32 bit parts). So I really don't understand what you are saying. The big
macro computes 4 rows in parallel, how could it do that, without using 64 bit
registers?
> [...]
bye
Denes
More information about the ffmpeg-devel
mailing list