[FFmpeg-devel] [PATCH] encoder for adobe's flash ScreenVideo2 codec

Thu Jul 23 00:05:31 CEST 2009

Joshua Warner wrote:
> Him
> 
> I fixed the issues you guys have commented on (tell me if I
> accidentally missed one), and the revised patch is attached.

I'll give a second batch of comments...

> +/**
> + * @file libavcodec/flashsv2enc.c
> + * Flash Screen Video Version 2 encoder
> + * @author Joshua Warner
> + */
> +
> +/* Differences from version 1 stream:
> + * NOTE: Currently, the only player that supports version 2 streams is Adobe Flash Player itself.
> + * * Supports sending only a range of scanlines in a block,
> + *   indicating a difference from the corresponding block in the last keyframe.
> + * * Supports initializing the zlib dictionary with data from the corresponding
> + *   block in the last keyframe, to improve compression.
> + * * Supports a hybrid 15-bit rgb / 7-bit palette color space.
> + */
> +
> +/* TODO:
> + * Don't keep Block structures for both current frame and keyframe.
> + * Make better heuristics for deciding stream parameters (optimum_* functions).  Currently these return constants.
> + * Figure out how to encode palette information in the stream, choose an optimum palette at each keyframe.
> + * Figure out how the zlibPrimeCompressCurrent flag works, implement support.
> + * Find other sample files (that weren't generated here), develop a decoder.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>

Are both includes needed?

> +#include "avcodec.h"
> +#include "put_bits.h"
> +#include "bytestream.h"

Is bytestream.h used?

> +static av_cold void cleanup(FlashSV2Context * s)
> +{
> +    if (s->encbuffer)
> +        av_free(s->encbuffer);

No need to check if s->encbuffer is null, av_free() already does that.

> +static av_cold int flashsv2_encode_init(AVCodecContext * avctx)
> +{
> +    FlashSV2Context *s = avctx->priv_data;
> +
> +    s->avctx = avctx;
> +
> +    s->comp = avctx->compression_level;
> +    if (s->comp == -1)
> +        s->comp = 9;
> +    if (s->comp < 0 || s->comp > 9) {
> +        av_log(avctx, AV_LOG_ERROR,
> +               "Compression level should be 0-9, not %d\n", s->comp);
> +        return -1;
> +    }
> +
> +
> +    if ((avctx->width > 4095) || (avctx->height > 4095)) {
> +        av_log(avctx, AV_LOG_ERROR,
> +               "Input dimensions too large, input must be max 4096x4096 !\n");
> +        return -1;
> +    }
> +
> +    if (avcodec_check_dimensions(avctx, avctx->width, avctx->height) < 0)
> +        return -1;
> +
> +
> +    s->last_key_frame = 0;

This is unneeded, the context is already alloc'ed with av_mallocz().

> +static inline unsigned int chroma_diff(unsigned int c1, unsigned int c2)
> +{
> +    unsigned int t1 = (c1 & 0x000000ff) + ((c1 & 0x0000ff00) >> 8) + ((c1 & 0x00ff0000) >> 16);
> +    unsigned int t2 = (c2 & 0x000000ff) + ((c2 & 0x0000ff00) >> 8) + ((c2 & 0x00ff0000) >> 16);
> +
> +    return abs(t1 - t2) + abs((c1 & 0x000000ff) - (c2 & 0x000000ff)) +
> +        abs(((c1 & 0x0000ff00) >> 8) - ((c2 & 0x0000ff00) >> 8)) +
> +        abs(((c1 & 0x00ff0000) >> 16) - ((c2 & 0x00ff0000) >> 16));
> +}

Does doing the square instead of abs() is faster and/or looks better?

> +static int optimum_use15_7(FlashSV2Context * s)
> +{
> +#ifndef FLASHSV2_DUMB
> +    double ideal = ((double)(s->avctx->bit_rate * s->avctx->time_base.den * s->avctx->ticks_per_frame)) /
> +        ((double) s->avctx->time_base.num) * s->avctx->frame_number;
> +    if (ideal + use15_7_threshold < s->total_bits) {
> +        return 1;
> +    } else {
> +        return 0;
> +    }
> +#else
> +    return s->avctx->global_quality == 0;
> +#endif
> +}

I think if you were trying to encode optimally (if it's worth the price 
of been 2x slower), I'd suggest, for each (key?)frame:

1- Encode with 15_7 and see how many bits is consumed (after zlib) and 
how much distortion (measured, for ex., using chroma_diff()) you get.
2- Encode with bgr and see both the number of bits consumed after zlib 
and the distortion.

Then, you choose the one that has the smallest quantity (distortion + 
lambda*rate). The reasoning behind that is better explained at 
doc/rate_distortion.txt. The parameter lambda is found in frame->quality 
and is passed from the command line by "-qscale" ("-qscale 2.3" => 
frame->quality == (int) 2.3*FF_LAMBDA_SCALE). It is also a good starting 
point to implement in future rate control (using VBR with a given 
average bitrate gives better quality than CBR).

Note that what is explained in rate_distortion.txt is already what you 
are doing with the s->dist parameter (s->dist == 8*lambda), so this 
"solves" the problem of finding the optimum dist.

If the speed loss is not worth the price of trying both methods, I think 
that s->use15_7 should be chosen set based on frame->quality (by testing 
on a few samples from what quality value using bgr starts been optimal 
on average).

Unfortunately, the rate distortion method do not solve the problem of 
finding the optimal block size. How much do quality/bitrate depend on it?

-Vitor