[FFmpeg-devel] [PATCH] encoder for adobe's flash ScreenVideo2 codec
Vitor Sessak
vitor1001
Thu Jul 23 00:05:31 CEST 2009
Joshua Warner wrote:
> Him
>
> I fixed the issues you guys have commented on (tell me if I
> accidentally missed one), and the revised patch is attached.
I'll give a second batch of comments...
> +/**
> + * @file libavcodec/flashsv2enc.c
> + * Flash Screen Video Version 2 encoder
> + * @author Joshua Warner
> + */
> +
> +/* Differences from version 1 stream:
> + * NOTE: Currently, the only player that supports version 2 streams is Adobe Flash Player itself.
> + * * Supports sending only a range of scanlines in a block,
> + * indicating a difference from the corresponding block in the last keyframe.
> + * * Supports initializing the zlib dictionary with data from the corresponding
> + * block in the last keyframe, to improve compression.
> + * * Supports a hybrid 15-bit rgb / 7-bit palette color space.
> + */
> +
> +/* TODO:
> + * Don't keep Block structures for both current frame and keyframe.
> + * Make better heuristics for deciding stream parameters (optimum_* functions). Currently these return constants.
> + * Figure out how to encode palette information in the stream, choose an optimum palette at each keyframe.
> + * Figure out how the zlibPrimeCompressCurrent flag works, implement support.
> + * Find other sample files (that weren't generated here), develop a decoder.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
Are both includes needed?
> +#include "avcodec.h"
> +#include "put_bits.h"
> +#include "bytestream.h"
Is bytestream.h used?
> +static av_cold void cleanup(FlashSV2Context * s)
> +{
> + if (s->encbuffer)
> + av_free(s->encbuffer);
No need to check if s->encbuffer is null, av_free() already does that.
> +static av_cold int flashsv2_encode_init(AVCodecContext * avctx)
> +{
> + FlashSV2Context *s = avctx->priv_data;
> +
> + s->avctx = avctx;
> +
> + s->comp = avctx->compression_level;
> + if (s->comp == -1)
> + s->comp = 9;
> + if (s->comp < 0 || s->comp > 9) {
> + av_log(avctx, AV_LOG_ERROR,
> + "Compression level should be 0-9, not %d\n", s->comp);
> + return -1;
> + }
> +
> +
> + if ((avctx->width > 4095) || (avctx->height > 4095)) {
> + av_log(avctx, AV_LOG_ERROR,
> + "Input dimensions too large, input must be max 4096x4096 !\n");
> + return -1;
> + }
> +
> + if (avcodec_check_dimensions(avctx, avctx->width, avctx->height) < 0)
> + return -1;
> +
> +
> + s->last_key_frame = 0;
This is unneeded, the context is already alloc'ed with av_mallocz().
> +static inline unsigned int chroma_diff(unsigned int c1, unsigned int c2)
> +{
> + unsigned int t1 = (c1 & 0x000000ff) + ((c1 & 0x0000ff00) >> 8) + ((c1 & 0x00ff0000) >> 16);
> + unsigned int t2 = (c2 & 0x000000ff) + ((c2 & 0x0000ff00) >> 8) + ((c2 & 0x00ff0000) >> 16);
> +
> + return abs(t1 - t2) + abs((c1 & 0x000000ff) - (c2 & 0x000000ff)) +
> + abs(((c1 & 0x0000ff00) >> 8) - ((c2 & 0x0000ff00) >> 8)) +
> + abs(((c1 & 0x00ff0000) >> 16) - ((c2 & 0x00ff0000) >> 16));
> +}
Does doing the square instead of abs() is faster and/or looks better?
> +static int optimum_use15_7(FlashSV2Context * s)
> +{
> +#ifndef FLASHSV2_DUMB
> + double ideal = ((double)(s->avctx->bit_rate * s->avctx->time_base.den * s->avctx->ticks_per_frame)) /
> + ((double) s->avctx->time_base.num) * s->avctx->frame_number;
> + if (ideal + use15_7_threshold < s->total_bits) {
> + return 1;
> + } else {
> + return 0;
> + }
> +#else
> + return s->avctx->global_quality == 0;
> +#endif
> +}
I think if you were trying to encode optimally (if it's worth the price
of been 2x slower), I'd suggest, for each (key?)frame:
1- Encode with 15_7 and see how many bits is consumed (after zlib) and
how much distortion (measured, for ex., using chroma_diff()) you get.
2- Encode with bgr and see both the number of bits consumed after zlib
and the distortion.
Then, you choose the one that has the smallest quantity (distortion +
lambda*rate). The reasoning behind that is better explained at
doc/rate_distortion.txt. The parameter lambda is found in frame->quality
and is passed from the command line by "-qscale" ("-qscale 2.3" =>
frame->quality == (int) 2.3*FF_LAMBDA_SCALE). It is also a good starting
point to implement in future rate control (using VBR with a given
average bitrate gives better quality than CBR).
Note that what is explained in rate_distortion.txt is already what you
are doing with the s->dist parameter (s->dist == 8*lambda), so this
"solves" the problem of finding the optimum dist.
If the speed loss is not worth the price of trying both methods, I think
that s->use15_7 should be chosen set based on frame->quality (by testing
on a few samples from what quality value using bgr starts been optimal
on average).
Unfortunately, the rate distortion method do not solve the problem of
finding the optimal block size. How much do quality/bitrate depend on it?
-Vitor
More information about the ffmpeg-devel
mailing list