[FFmpeg-devel] [PATCH] SPARC VIS simple_idct try#7
    Balatoni Denes 
    dbalatoni
       
    Thu Aug 30 02:13:20 CEST 2007
    
    
  
Hi Michael!
Just a question (and a half question):
Thursday 30 August 2007 01:25-kor Michael Niedermayer ezt ?rta:
> > +    /* 3. column */\
> > +        "3:                             \n\t"\
> > +        "for %%f8, %%f10, %%f60         \n\t"\
> > +        "fcmpd %%fcc0, %%f62, %%f60     \n\t"\
>
> the for and fcmp can similarely be moved up, you have to switch to fcc1
> though to avoid a conflict with the above ones
> this applies to the other for/fcmpd as well
Why do I have to switch to fcc1, there is plenty of space to place the fcmpds 
without conflict ? Also checking for equality is %fcc0.
>
> [...]
>
> > +        TRANSPOSE
> > +        IDCT4ROWS
> > +        SCALEROWS
> > +        PUTPIXELSCLAMPED("0")
> > +        LOAD("%2+64")
> > +        TRANSPOSE
> > +        IDCT4ROWS
> > +        SCALEROWS
> > +        PUTPIXELSCLAMPED("4")
>
> the SCALEROWS is unneeded, the fpack16 can do the downshift and a single
> addition to the 0,0 coefficient before the idct or first column after the
> transpose can compensate for the rounding difference
Indeed, I missed this. However that one add has to be after multiplication - 
because while in the C simple idct all coefficients are multiplied by 
1/sqrt(2), here they are not (correct me if I am wrong, but this is slightly 
more accurate imho).
bye
Denes
    
    
More information about the ffmpeg-devel
mailing list