[Ffmpeg-devel] VC1 compliance (was: a little optim for a SSE version of H263_LOOP_FILTER)

Mon Nov 13 13:47:50 CET 2006

   Hi everybody,

> Message du 12/11/06 09:32
> De : "Stefan Gehrer" <stefan.gehrer at gmx.de>
> >   
> >> I am still surprised about the input to overlap being uint8_t
> >> as my understanding of VC1 was that the overlap has to be
> >> done with the pixels before clipping, which can be both
> >> negative and beyond 255. I remember someone brought this
> >> up before on the list but I think there was no response?
> >>     
> >
> > The logic is simple: while VC1 standard demands processing of
> > at least 10-bit samples lavc implies 8-bit samples. Any workaround
> > will be too messy and slow (and I don't think quality will
> > significantly degrade).
> >
> > I think supporting 16 bit per sample formats (grayscale is already
> > supported) is nice but here it would be an overkill.

   I quite subscribe here.

> Instead of having the complete picture in 16 bit, I was more thinking
> about leaving the pictures in 8bit samples but having two extra
> horizontal lines of storage in 16bit. So at the end of processing a block,
> you store the bottom two lines of it in that storage, clip the whole block
> and store it in the framebuffer. When it's the turn of the block below,
> you do the overlap with the values in the extra storage.
> 
> IMHO it would be nice to have it like in H.264: A mode with strict 
> compliance
> which can be bit-wise compared to reference decodes (AFAIK this should be
> possible in VC1 too as there is no IDCT with variation), and some "fast" 
> flags
> which would then trade compliance/quality for speed.

   Btw: it seems to me there's a much more important source
   of non-compliance in the bicubic MC function (vc1_mspel_mc(), vc1dsp.c)
   Last time i checked, the specs was requiring an intermediate storage
   with 16bits values between vertical and horizontal pass. May have
   changed since, MS being quite used to changing the specs afterward they
   implement it wrongly (bwehe).

   So, in vc1_mspel_mc(), i see 3 non-conformance problems:

   a) intermediate storage tmp[] is uint8_t, instead of 16bits.
   b) the horizontal pass is done first, whilst the norm says "first
   vertical, then horizontal".
   c) the norm also says maximum accuracy should be retained to fit 16bits
   storage during descaling. Means: each hpel/qpel position needs its a 
   particular shift / rounding constant to discard as few bits as possible.
   In the ref impl, it's vc1INTERP_Bicubic_Vert_Filter_Shift_Table[] and
   vc1INTERP_Bicubic_Horiz_Filter_Shift_Table[]. In particular, there's
   a special cases needed for pure full-pel horizontal or vertical 
   interpolation.

   as usual, i may be wrong here...

   bye!
Skal