[FFmpeg-devel] [PATCH V2] [RFC] GSoC: FLIF16 Image format parser

Mon Mar 2 12:51:00 EET 2020

Hello,

> Is there an upper bound for the size of these integers? In my
> experience, varints in multimedia code are used to save a few bits in
> the file than to allow very large integers. You could make the code much
> simpler if you had an upper bound.
> 
> Also, there is probably already similar code, because it is a common
> feature.

According to the specification of the file format, there is no mention of an
upper bound for the integer: https://flif.info/spec.html#_part_1_main_header

>> + * magnitude of the varint, similar to UTF-8 encoding.
> 
> This is not how UTF-8 works.

I was talking about it in the context of the 'prefix coding' it uses. UTF-8
uses the MSB bits to indicate how many successive bits aside from the starting
bit are included in a character. The varint does a similar thing in the manner
that it also uses an MSB bit to indicate that a successive bit is included in
the number. Hovever aside from this superficial similarity there is not much
in common.

>> +varint_t *flif16_varint_alloc(unsigned int pad, uint8_t lsbval)
>> +{
>> +    varint_t *v = av_mallocz(sizeof(varint_t));
>
>When possible, prefer sizeof(*var) to sizeof(type).

Okay

>> + if(!v)
> 
> Our coding style mandates spaces after if, for, etc.

Yes. There are several other non-uniformities in the code.

>> + return v;
> 
> return NULL is more idiomatic.

True. I was looking through the AVpacket functions and in the av_packet_alloc
function: https://ffmpeg.org/doxygen/trunk/avpacket_8c_source.html#l00051 (line 51)
there's a similar pattern followed:

    AVPacket *pkt = av_mallocz(sizeof(AVPacket));
    if (!pkt)
        return pkt;

Is there a special reason for this?

>> +    v->buf = av_mallocz(sizeof(uint8_t) * (pad + 1));
>
>sizeof(uint8_t) is guaranteed to be 1 on supported platforms.
>
>Missing malloc check. Better make it:
>
>	v = av_mallocz(sizeof(*v) + pad);
>
>to allocate both at the same time: more efficient.

I see. the stucture should therefore be defined with buf after buf_size
(hence putting pad + 1 bytes in contiguous storage but there may be a problem
with the byte padding of the struct), or should buf point to v + sizeof(*v)?
(* See follow-up content below)

>> +    v->buf[pad] = lsbval & 127;
>> +    v->buf_size = pad + 1;
>> +    return v;
>> +}
>> +
>> +void flif16_varint_free(varint_t *v)
>> +{
>> +    av_free(v->buf);
>> +    av_free(v);
>> +}
>> +
>> +// May be made faster by providing MSByte as a parameter.
>> +unsigned int flif16_varint_inc(varint_t *v)
>> +{
>> +    unsigned int msb;
>
>> +    av_assert0(v->buf_size);
>
>If speed is a concern, use av_assert1().

Okay

>> +    msb = v->buf_size - 1;
>> +    
>> +    for(; msb > 0; --msb) {
>
>> +        if(v->buf[msb] == 127) {
>> +            v->buf[msb] = 0;
>> +        } else {
>> +            ++v->buf[msb];
>> +            return msb;
>> +        }
>
>Did you test this code for varint { 127, 127, 127 }?

I have tested it separately. In the given code, the function will set the value 
to { 128, 0, 0 }, which is an incorrect representation. But for the puropses of 
the parser and the decoder where we are only concerned with bound checking with 
an existing number, such a situation will never occur. However I can see that 
there will be issues when the encoder will be written since then we will have 
to make the varint and write the varint to a file.

>> +    }
>> +    
>> +    ++v->buf[msb];
>> +    return msb;
>> +}
>> +
>> 
>> +void flif16_varint_append(varint_t *v, uint8_t vs)
>> +{
>> +    av_fast_realloc(v->buf, &v->buf_size, v->buf_size + 1);
>
>> +    av_assert0(v->buf);
>
>An assert cannot be used to check for a memory allocation failure.

I think I have to use av_log then.

>> + v->buf[v->buf_size - 1] = vs & 127;
>> +}
> 
> Increasing a buffer one by one is inefficient. Please consider doubling
> the size.

Should I double the size everytime v->buf_size becomes
equal to the actual number of byte-segments of the varint?

>> +typedef struct varint_t {
>> + uint8_t *buf;
> 
> To avoid the indirection and extra malloc, you can make it
> uint8_t buf[1] at the end of the structure.

If I add it to the end of the struct, the compiler will likely add padding
bits to it, but I don't think its a problem since the padding bits will be set
to zero anyway and therefore can be used in the buffer.

>> + unsigned int buf_size; // Assuming that no one will try to make an image of
>> + // dimensions greater than 2^32x2^32
> 
> unsigned is enough. Same everywhere.
> 
> I suppose size_t would be way overkill for this case.

Ok

>> +} varint_t;
> 
> Names ending in _t are reserved by the C standard. FFmpeg uses types
> with capitals and FF prefix for private APIs.

FLIF16Varint should be fine then?

>> +
>> +/**
>> + * Allocates an empty varint.
>> + * Note the 'pad' parameter here. This can be used to pad the number with 
>> + * preceeding bytes if we already know the number of bytes in the number.
>> + * @param pad The number of chars to pad the varint with.
>> + * @param lsbval The value to put in the least significant BYTE
>> + * @return The pointer to the allocated varint.
>> + */
>
>> +varint_t *flif16_varint_alloc(unsigned int pad, uint8_t lsbval);
>
> All global functions are required to have the ff_ prefix (unless they
> are public, then avsomething_).

Okay

>> +/**
>> + * Increments a varint.
>> + * @param v The varint to increment.
>> + * @return The position of the most significant BYTE starting from 0.This
>> + *         can be used to avoid using the comparison function.
>> + */
>> +unsigned int flif16_varint_inc(varint_t *v);
>
> Please document the API constraint that the buffer must need to be large
> enough.

Okay

>> +typedef enum FLIF16ParseStates {
>> +    FLIF16_HEADER = 1,
>> +    FLIF16_METADATA,
>> +    FLIF16_BITSTREAM,
>> +    // FLIF16_TRANSFORM,
>> +    FLIF16_PIXELDATA,
>> +    FLIF16_CHECKSUM,
>> +    FLIF16_VARINT
>> +} flif16_states;
>
> We usually name the enum and the corresponding types the same way.

I had modelled most of the code after the GIF parser, and it names the enum
values in the same manner:
https://ffmpeg.org/doxygen/trunk/gif__parser_8c_source.html
I will probably prefix them with 'FLIF16_STATE_' then.

>> +            } else if(f->index == 4) {
>> +                // Handle the size varint
>> +                vcur = flif16_varint_alloc(0, buf[index]);
>
>> +                av_assert0(vcur);
>
> Same as above, and same below: av_assert0() can only be used for
> conditions that can be proven statically.

Ok

>> +static int flif16_parse(AVCodecParserContext *s, AVCodecContext *avctx,
>> +                     const uint8_t **poutbuf, int *poutbuf_size,
>> +                     const uint8_t *buf, int buf_size)
>
> Alignment is off.

I'll fix it (If you were talking about the textual alignment).

> 
> Regards,
> 
> --
> Nicolas George

Thanks.