[FFmpeg-devel] [PATCH] Unfinished GSoC qual task (16bit VQA Video Decoder)

Sun Apr 26 02:38:55 CEST 2009

On Sun, Apr 26, 2009 at 01:03:23AM +0200, Adam Iglewski wrote:
> Hi,
>
> Thanks for reviews. Updated patches attached.
>
> Vitor Sessak pisze:
> > Duplicating 4XM code is not ok (you could archive the same result of
> > your patch just putting "case CODEC_ID_ADPCM_IMA_WS:" inside the 4XM
> > case block). Also, did you test your code with our samples archive? It
> > would surprise me that CODEC_ID_ADPCM_IMA_WS never worked for any file...
> >
>
> I did. Files with one channel plays fine but just try playing any sample 
> from Tiberian Sun.
>
> http://samples.mplayerhq.hu/game-formats/vqa/Tiberian%20Sun%20VQAs/
>
> As far as I can tell they are the only ones with stereo sound. I'm 
> attaching new patch as suggested.
>
> >> +static inline void vqa_copy_hc_block(unsigned short *pixels,int
> >> pixel_ptr,int stride,unsigned short *codebook,
> >> +                                      int vector_index,int block_h)
> >
> > I prefer using uint16_t instead of "unsigned short", since what we
> > actually want is a 16-bit uint...
>
> Fixed.
>
> >> +{
> >> +    int pixel_y;
> >> +    for (pixel_y = 0; pixel_y < block_h; pixel_y++) {
> >> +        pixels[pixel_ptr] = codebook[vector_index++];
> >> +        pixels[pixel_ptr+1] = codebook[vector_index++];
> >
> > If you add one more whitespace it is more readable:
> >
> >         pixels[pixel_ptr  ] = codebook[vector_index++];
> >         pixels[pixel_ptr+1] = codebook[vector_index++];
> >
> > but I think here one memcpy would be even better...
> >
>
> Fixed. Now using memcpy.
>
> >> +
> >> +        pixel_ptr = ((block_y * s->vector_height) *
> >> stride)+block_x*block_inc;
> >> +        vptr_action_code = AV_RL16(&src[src_ptr]);
> >> +        src_ptr+=2;
> >
> > Instead of using src_ptr and constructs like src[src_ptr++], you could
> > just increment the src pointer. More precisely, instead of
> >
> >   a= src[src_ptr++];
> >
> > just
> >
> >   a = *src++;
> >
> > and instead of
> >
> >   a= AV_RL16(&src[src_ptr]);
> >   src_ptr+=2;
> >
> > just
> >
> >   a = bytestream_get_le16(&src);
> >
>
> Fixed. But should I use bytestream_get_byte or *ptr++ is OK?

*ptr++ is fine

>
> > Also please read your code again to check if a malicious malformed VQA
> > file would not make your code write to unallocated memory. This would be
> > a serious security breach...
>
> I added macro that checks if number of blocks to write will exceed total
> number of blocks in frames.
>
> Michael Niedermayer pisze:
> [...]
>>> +    if(!AV_RL16(&vqa_header[14]))
>>> +      avctx->pix_fmt = PIX_FMT_RGB555;
>>> +    else
>>> +      avctx->pix_fmt = PIX_FMT_PAL8;
>>> +
>> inconsistent indention
>
> Fixed.
>
>>>      /* the vector dimensions have to meet very stringent requirements */
>>>      if ((s->vector_width != 4) ||
>>>          ((s->vector_height != 2) && (s->vector_height != 4))) {
>>> @@ -205,11 +211,17 @@ static void decode_format80(const unsign
>>>       int src_index = 0;
>>>      int dest_index = 0;
>>> +    int new_format = 0;
>>>      int count;
>>>      int src_pos;
>>>      unsigned char color;
>>>      int i;
>>>  +    if ((!src[src_index]) || (src_size > 0xffff)) {
>> superflous ()
>
> Fixed.
>
>>> +        new_format = 1;
>>> +        src_index++;
>>> +    }
>>> +
>>>      while (src_index < src_size) {
>>>           vqa_debug("      opcode %02X: ", src[src_index]);
>>> @@ -230,6 +242,8 @@ static void decode_format80(const unsign
>>>              count = AV_RL16(&src[src_index]);
>>>              src_index += 2;
>>>              src_pos = AV_RL16(&src[src_index]);
>>> +            if(new_format)
>>> +              src_pos = dest_index-src_pos;
>>>              src_index += 2;
>>>              vqa_debug("(1) copy %X bytes from absolute pos %X\n", count, 
>>> src_pos);
>>>              CHECK_COUNT();
>>> @@ -252,6 +266,8 @@ static void decode_format80(const unsign
>>>               count = (src[src_index++] & 0x3F) + 3
>>>              src_pos = AV_RL16(&src[src_index]);
>>> +            if(new_format)
>>> +              src_pos = dest_index-src_pos;
>>>              src_index += 2;
>>>              vqa_debug("(3) copy %X bytes from absolute pos %X\n", count, 
>>> src_pos);
>>>              CHECK_COUNT();
>> looks like code duplication (this should be fixed in a seperate patch)
>>
>
> Will do.
>

>>> @@ -291,6 +307,124 @@ static void decode_format80(const unsign
>>>                  dest_index, dest_size);
>>>  }
>>>  +static inline void vqa_copy_hc_block(unsigned short *pixels,int 
>>> pixel_ptr,int stride,unsigned short *codebook,
>>> +                                      int vector_index,int block_h)
>>> +{
>>> +    int pixel_y;
>>> +    for (pixel_y = 0; pixel_y < block_h; pixel_y++) {
>>> +        pixels[pixel_ptr] = codebook[vector_index++];
>>> +        pixels[pixel_ptr+1] = codebook[vector_index++];
>>> +        pixels[pixel_ptr+2] = codebook[vector_index++];
>>> +        pixels[pixel_ptr+3] = codebook[vector_index++];
>>> +        pixel_ptr += stride;
>>> +    }
>>> +}
>> vector_index is redundant, so is pixel_ptr
>
> Fixed. And as Vitor suggested now using memcpy.

memcpy might be a bad choice, did you benchmark this one?
personally id cast to uint64_t and do a single assignment ...

>
> [...]
>>> +    block_x = block_y = 0;
>>> +    while(total_blocks > 0) {
>>> +
>>> +        pixel_ptr = ((block_y * s->vector_height) * 
>>> stride)+block_x*block_inc;
>> superflous ()
>
> Fixed.
>
>>> +        vptr_action_code = AV_RL16(&src[src_ptr]);
>>> +        src_ptr+=2;
>> bytestream_get_le16()
>
> Fixed.
>
> [...]
>>> +
>>> +            /* Write block number (Val & 0xff) and then write Count 
>>> blocks
>>> +            * getting their indexes by reading next Count bytes from
>>> +            * the VPTR chunk. Count is (((Val/256) & 0x1f)+1)*2. */
>>> +            case 0x4000:
>>> +                vector_index = (vptr_action_code & 0xff) << index_shift;
>> please indent the comments differerntly so the code is easier to
>> see / is more seperated from the comments
>
> Fixed.
>
> [...]
>>> +            case 0xa000:
>>> +                vector_index = (vptr_action_code & 0x1fff) << 
>>> index_shift;
>>> +                blocks_done = src[src_ptr++];
>>> +
>>> +                for(i=0;i<blocks_done;i++) {
>>> +                    vqa_copy_hc_block(pixels,pixel_ptr,stride,codebook,
>>> +                                        vector_index,s->vector_height);
>>> +                    pixel_ptr += block_inc;
>>> +                }
>> looks exploitable
>
> As I wrote above. I added a macro to avoid writing outside frame
> memory.
>
> [..]
>>> @@ -353,6 +489,14 @@ static void vqa_decode_chunk(VqaContext          
>>> case VPTZ_TAG:
>>>              vptz_chunk = index;
>>>              break;
>>> + 
>> trailing whitespace
>
> Fixed.
>

>>> +        case VPTR_TAG:
>>> +            vptr_chunk = index;
>>> +            break;
>>> +
>>> +        case VPRZ_TAG:
>>> +            vprz_chunk = index;
>>> +            break;
>>>  
>> why are the chunks not decoded immedeatly?
>
> This was the original code flow and I just added chunks for
> decoding 16 bit frames. But I believe that this is because chunk order
> in file does not guarantee correct order for decoding frame eg. VPTR chunk 
> which contain indices to codebook can appear before CBFZ chunk which 
> contain codebook. But I might be wrong. Maybe Mike Melanson
> will know better ?

mike?

anyway, if true its still possible to have a simple function to traverse the
chunks and search for a specific one, that then could just be called with the
types in the needed order, i suspect this might be slightly simpler

>
> And regarding comments about using bytestream_get_le16 instead of AV_RL16 
> and incrementing ptr. Should I replace it in every place in the file or 
> just in new code added by me?

a seperate patch simplifying all is welcome

>
> Again thank for the reviews.
>
> Adam
>

> --- ffmpeg/libavcodec/vqavideo.c	2009-04-20 20:37:46.000000000 +0200
> +++ ffmpeg_work/libavcodec/vqavideo.c	2009-04-26 00:01:43.000000000 +0200
> @@ -70,10 +70,19 @@
>  
>  #include "libavutil/intreadwrite.h"
>  #include "avcodec.h"
> +#include "bytestream.h"
>  
>  #define PALETTE_COUNT 256
>  #define VQA_HEADER_SIZE 0x2A
>  #define CHUNK_PREAMBLE_SIZE 8

> +#define CHECK_BLOCKS_COUNT(n) \
> +  if ((total_blocks - n) < 0) { \
> +    av_log(s->avctx, AV_LOG_ERROR, " VQA video warning: writing %d blocks will exceed frame block count %d\n", \
> +      n, total_blocks); \
> +    av_log(s->avctx, AV_LOG_ERROR, " VQA video warning: action code %X, action count %d\n", \
> +      vptr_action_code & 0xe000, n); \
> +    return; \
> +  }

macros are not a solution to code duplication, the code is still duplicated
in the binary

[...]
> @@ -230,6 +251,8 @@ static void decode_format80(const unsign
>              count = AV_RL16(&src[src_index]);
>              src_index += 2;
>              src_pos = AV_RL16(&src[src_index]);
> +            if(new_format)
> +              src_pos = dest_index-src_pos;

indent ...

[...]
> +    block_x = block_y = 0;
> +    while(total_blocks > 0) {
> +
> +        pixels = (uint16_t *)s->frame.data[0] + (block_y * s->vector_height * stride)+block_x*block_inc;
> +        vptr_action_code = bytestream_get_le16(&src);
> +
> +        switch(vptr_action_code & 0xe000) {
> +
> +            /* Skip Count blocks. Count is (Val & 0x1fff). */
> +            case 0x0000:
> +                blocks_done = vptr_action_code & 0x1fff;
> +                break;
> +

> +            /* Write block number (Val & 0xff) Count times.
> +             * Count is (((Val/256) & 0x1f)+1)*2
> +             */
> +            case 0x2000:
> +                vector_index = (vptr_action_code & 0xff) << index_shift;
> +                blocks_done = (((vptr_action_code>>8) & 0x1f)+1) << 1;
> +                CHECK_BLOCKS_COUNT(blocks_done);
> +
> +                for(i=0;i<blocks_done;i++) {
> +                    vqa_copy_hc_block(pixels,stride,&codebook[vector_index],s->vector_height);
> +                    pixels += block_inc;
> +                }
> +                break;
> +
> +            /* Write block number (Val & 0xff) and then write Count blocks
> +             * getting their indexes by reading next Count bytes from
> +             * the VPTR chunk. Count is (((Val/256) & 0x1f)+1)*2
> +             */
> +            case 0x4000:
> +                vector_index = (vptr_action_code & 0xff) << index_shift;
> +                blocks_done = (((vptr_action_code>>8) & 0x1f)+1) << 1;
> +                CHECK_BLOCKS_COUNT(blocks_done+1);
> +
> +                vqa_copy_hc_block(pixels,stride,&codebook[vector_index],s->vector_height);
> +                pixels += block_inc;
> +
> +                for(i=0;i<blocks_done;i++) {
> +                    vector_index = *src++;
> +                    vector_index <<= index_shift;
> +                    vqa_copy_hc_block(pixels,stride,&codebook[vector_index],s->vector_height);
> +                    pixels += block_inc;
> +                }
> +                blocks_done++;
> +                break;
> +
> +            /* Write block (Val & 0x1fff).*/
> +            case 0x6000:
> +                vector_index = (vptr_action_code & 0x1fff) << index_shift;
> +                CHECK_BLOCKS_COUNT(1);
> +                vqa_copy_hc_block(pixels,stride,&codebook[vector_index],s->vector_height);
> +                blocks_done=1;
> +                break;
> +
> +            /* Write block (Val & 0x1fff) Count times.
> +             * Count is the next byte from the VPTR chunk
> +             */
> +            case 0xa000:
> +                vector_index = (vptr_action_code & 0x1fff) << index_shift;
> +                blocks_done = *src++;
> +                CHECK_BLOCKS_COUNT(blocks_done);
> +
> +                for(i=0;i<blocks_done;i++) {
> +                    vqa_copy_hc_block(pixels,stride,&codebook[vector_index],s->vector_height);
> +                    pixels += block_inc;
> +                }

theres alot of code duplication in there, its basically the same thing
done over and over again

> +                break;
> +        }
> +        block_y += (block_x + blocks_done) / blocks_wide;
> +        block_x  = (block_x + blocks_done) % blocks_wide;
> +        total_blocks -= blocks_done;

can count cross the right border and continue to the next row?
if yes, your code is buggy
if no, it can be simplified into a for(all rows) for(all columns)

[...]
> @@ -581,12 +750,22 @@ static int vqa_decode_frame(AVCodecConte
>          av_log(s->avctx, AV_LOG_ERROR, "  VQA Video: get_buffer() failed\n");
>          return -1;
>      }

> +    } else {
> +    s->frame.reference = 1;
> +    s->frame.buffer_hints = FF_BUFFER_HINTS_VALID | FF_BUFFER_HINTS_PRESERVE | FF_BUFFER_HINTS_REUSABLE;
> +    if (avctx->reget_buffer(avctx, &s->frame)) {
> +        av_log(s->avctx, AV_LOG_ERROR, "reget_buffer() failed\n");
> +        return -1;
> +    }
> +    }

indent

[...]
> --- ffmpeg/libavcodec/adpcm.c	2009-04-20 18:29:22.000000000 +0200
> +++ ffmpeg_work/libavcodec/adpcm.c	2009-04-26 00:27:30.000000000 +0200
> @@ -1005,6 +1005,7 @@ static int adpcm_decode_frame(AVCodecCon
>          if (cs->step_index < 0) cs->step_index = 0;
>          if (cs->step_index > 88) cs->step_index = 88;
>  
> +    case CODEC_ID_ADPCM_IMA_WS:
>          m= (buf_size - (src - buf))>>st;
>          for(i=0; i<m; i++) {
>              *samples++ = adpcm_ima_expand_nibble(&c->status[0], src[i] & 0x0F, 4);
> @@ -1156,25 +1157,6 @@ static int adpcm_decode_frame(AVCodecCon
>              src++;
>          }
>          break;
> -    case CODEC_ID_ADPCM_IMA_WS:
> -        /* no per-block initialization; just start decoding the data */
> -        while (src < buf + buf_size) {
> -
> -            if (st) {
> -                *samples++ = adpcm_ima_expand_nibble(&c->status[0],
> -                    src[0] >> 4  , 3);
> -                *samples++ = adpcm_ima_expand_nibble(&c->status[1],
> -                    src[0] & 0x0F, 3);
> -            } else {
> -                *samples++ = adpcm_ima_expand_nibble(&c->status[0],
> -                    src[0] >> 4  , 3);
> -                *samples++ = adpcm_ima_expand_nibble(&c->status[0],
> -                    src[0] & 0x0F, 3);
> -            }
> -
> -            src++;
> -        }
> -        break;
>      case CODEC_ID_ADPCM_XA:
>          while (buf_size >= 128) {
>              xa_decode(samples, src, &c->status[0], &c->status[1],

That will break the existing cases that use CODEC_ID_ADPCM_IMA_WS i assume

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090426/f473fb91/attachment.pgp>