[FFmpeg-devel] [RFC] Direct Stream Transfer (DST) decoder

Tue Sep 30 21:00:12 CEST 2014

On Tue, Sep 30, 2014 at 11:42:20PM +1000, Peter Ross wrote:
> +/**
> + * Calculate FS44 ratio
> + */
> +#define DSD_FS44(sample_rate) (sample_rate / 44100)
> +
> +/**
> + * Calculate DST frame size
> + * @return samples per frame (1-bit samples)
> + */
> +#define DST_SAMPLES_PER_FRAME(sample_rate) (588 * DSD_FS44(sample_rate))

A header for just these seems a bit overkill.
Also I'd prefer static inline functions over macros when there is no
reason to use macros.

> +static int read_map(GetBitContext *gb, Table *t, unsigned int map[DST_MAX_CHANNELS], int channels)
> +{
> +    int ch;
> +    t->elements = 1;
> +    if (!get_bits1(gb)){
> +        map[0] = 0;
> +        for (ch = 1; ch < channels; ch++) {
> +            int bits = av_log2(t->elements) + 1;
> +            map[ch] = get_bits(gb, bits);
> +            if (map[ch] == t->elements) {
> +                t->elements++;
> +                if (t->elements >= DST_MAX_ELEMENTS)
> +                    return AVERROR_INVALIDDATA;
> +            } else if (map[ch] > t->elements) {
> +                return AVERROR_INVALIDDATA;
> +            }
> +        }
> +    } else {
> +        memset(map, 0, sizeof(*map) * DST_MAX_CHANNELS);
> +    }

Maybe I just don't understand how this should be used (in that case a
comment would be nice), but I find it suspicious that the if part
initializes channels elements while the else part does initialize
DST_MAX_CHANNELS elements.
Personally I also prefer to have one-liners always in the "if" branch.

> +static av_always_inline int get_sr_golomb_dst(GetBitContext *gb, unsigned int k)
> +{
> +#if 0
> +    /* 'run_length' upper bound is not specified; we can never be sure it will fit into get_bits cache */
> +    int v = get_ur_golomb(gb, k, INT_MAX, 0);

The code is too hard to read for the time of day (a comment on those
different golomb variants sure would help), but sure that
get_ur_golomb_shorten doesn't happen to do what you need?

> +                if (x >= 0)
> +                    c -= (x + 4) / 8;
> +                else
> +                    c += (-x + 3) / 8;

Uh, isn't this actually
c -= (x + 4) >> 3;
?

> +static uint8_t prob_dst_x_bit(int c)
> +{
> +    return (ff_reverse[c & 127] >> 1) + 1;
> +}

Since this seems to be in the inner loop, making
your own table might be worth it...

> +static void build_filter(int16_t table[DST_MAX_ELEMENTS][16][256], const Table *fsets)
> +{
> +    int i, j, k, l;
> +    for (i = 0; i < fsets->elements; i++) {
> +        int length = fsets->length[i];
> +        for (j = 0; j < 16; j++) {
> +            int total = FFMAX(0, FFMIN(length - j * 8, 8));

av_clip?

> +            for (k = 0; k < 256; k++) {
> +                int v = 0;
> +                for (l = 0; l < total; l++)
> +                    v += (((k >> l) & 1) * 2 - 1) * fsets->coeff[i][j * 8 + l];

Is this faster in a relevant way than something more readable like
coeff = fsets->coeff[i][j * 8 + l];
v += k & (1 << l) ? coeff : -coeff;
?

> +    unsigned int half_prob[DST_MAX_CHANNELS];
> +    unsigned int map_ch_to_felem[DST_MAX_CHANNELS];
> +    unsigned int map_ch_to_pelem[DST_MAX_CHANNELS];
> +    DECLARE_ALIGNED(16, uint8_t, status)[DST_MAX_CHANNELS][16];
> +    DECLARE_ALIGNED(16, int16_t, filter)[DST_MAX_ELEMENTS][16][256];

At least the last one is quite large I think.
Please consider putting it in the context or so instead.
That's also less problematic with alignment.

> +    if (!(avpkt->data[0] & 1)) {
> +        if (frame->nb_samples > avpkt->size - 1)
> +            av_log(avctx, AV_LOG_WARNING, "short frame");
> +        memcpy(frame->data[0], avpkt->data + 1, FFMIN(frame->nb_samples * avctx->channels, avpkt->size - 1));

Hm, shouldn't there be a check for avpkt->size > 0 somewhere first?

> +    if ((ret = init_get_bits8(&gb, avpkt->data, avpkt->size)) < 0)
> +    if ((ret = read_map(&gb, &fsets, map_ch_to_felem, avctx->channels)) < 0)
> +        if ((ret = read_map(&gb, &probs, map_ch_to_pelem, avctx->channels)) < 0)

I still think putting assignments into an if is really bad idea all
around.

> +    if (same) {
> +        probs.elements = fsets.elements;
> +        memcpy(map_ch_to_pelem, map_ch_to_felem, sizeof(map_ch_to_felem));
> +    } else {
> +        avpriv_request_sample(avctx, "Same_Mapping=0");
> +            return ret;
> +    }

Simpler as
> if (!same) {
>         avpriv_request_sample(avctx, "Same_Mapping=0");
>             return ret;
> }
and no else.
Also, paranoia: for memcpy preferably use sizeof(dst) over sizeof(src),
the effects are less bad if it goes wrong.

> +                uint64_t * s = (uint64_t*)status[ch];

That looks like a strict aliasing violation.

> +                v = ((predict >> 15) ^ residual) & 1;
> +                frame->data[0][ (i >> 3) * avctx->channels + ch] |= v << (7 - (i & 0x7 ));
> +
> +#if HAVE_BIGENDIAN
> +                /* FIXME: not tested */
> +                s[0] = (s[0] << 1) | ((s[1] >> 63) & 1);
> +                s[1] = (s[1] << 1) | v;
> +#else
> +                s[1] = (s[1] << 1) | ((s[0] >> 63) & 1);
> +                s[0] = (s[0] << 1) | v;
> +#endif

Haven't fully understood this, but wouldn't it be more efficient
especially on 32 bit systems to first collect e.g. 8 bits and only
then move them into s?
Especially if unrolling would be viable, in which case e.g.
7 - (i & 0x7 ) would become a constant as well.