[FFmpeg-devel] [PATCH] Add support for Audible AA files

Mon Jul 27 15:14:45 CEST 2015

Le nonidi 9 thermidor, an CCXXIII, Vesselin Bontchev a écrit :
> Hi!
> 
> This patch adds support for Audible AA files.
> 
> Audible samples can be obtained from,
> 
> https://gitlab.com/vesselin.bontchev/audible-samples/tree/master
> https://samples.ffmpeg.org/audible/
> 

> Currently, this code generates corrupt audio output (in some places, and deterministically).

Does this happen with the Go and Python implementations too?

> 
> By posting this patch, I am hoping to get some early feedback, and help in making this work.
> 
> Thanks,
> Vesselin

> From ba73543efc3fdc4b5c61e9cb56f998d748716e00 Mon Sep 17 00:00:00 2001
> From: Vesselin Bontchev <vesselin.bontchev at yandex.com>
> Date: Sun, 19 Jul 2015 23:16:36 +0200
> Subject: [PATCH] Add support for Audible AA files
> 
> https://en.wikipedia.org/wiki/Audible.com#Quality
> ---
>  doc/demuxers.texi        |  10 ++
>  doc/general.texi         |   2 +
>  libavformat/Makefile     |   1 +
>  libavformat/aadec.c      | 381 +++++++++++++++++++++++++++++++++++++++++++++++
>  libavformat/allformats.c |   1 +
>  5 files changed, 395 insertions(+)
>  create mode 100644 libavformat/aadec.c
> 
> diff --git a/doc/demuxers.texi b/doc/demuxers.texi
> index e45e1af..df95233 100644
> --- a/doc/demuxers.texi
> +++ b/doc/demuxers.texi
> @@ -18,6 +18,16 @@ enabled demuxers.
>  
>  The description of some of the currently available demuxers follows.
>  
> + at section aa
> +
> +Audible Format 2, 3, and 4 demuxer.
> +
> +This demuxer is used to demux Audible Format 2, 3, and 4 (.aa) files.
> +
> + at example
> +ffmpeg -v debug -i input.aa -c:a copy output.wav
> + at end example
> +
>  @section applehttp
>  
>  Apple HTTP Live Streaming demuxer.
> diff --git a/doc/general.texi b/doc/general.texi
> index a260e79..2b782e0 100644
> --- a/doc/general.texi
> +++ b/doc/general.texi
> @@ -228,6 +228,8 @@ library:
>  @item 8088flex TMV              @tab   @tab X
>  @item AAX                       @tab   @tab X
>      @tab Audible Enhanced Audio format, used in audiobooks.
> + at item AA                        @tab   @tab X
> +    @tab Audible Format 2, 3, and 4, used in audiobooks.
>  @item ACT Voice                 @tab   @tab X
>      @tab contains G.729 audio
>  @item Adobe Filmstrip           @tab X @tab X
> diff --git a/libavformat/Makefile b/libavformat/Makefile
> index cc73fd8..8405f8d 100644
> --- a/libavformat/Makefile
> +++ b/libavformat/Makefile
> @@ -395,6 +395,7 @@ OBJS-$(CONFIG_RTSP_DEMUXER)              += rtsp.o rtspdec.o httpauth.o \
>  OBJS-$(CONFIG_RTSP_MUXER)                += rtsp.o rtspenc.o httpauth.o \
>                                              urldecode.o
>  OBJS-$(CONFIG_SAMI_DEMUXER)              += samidec.o subtitles.o
> +OBJS-$(CONFIG_AA_DEMUXER)                += aadec.o
>  OBJS-$(CONFIG_SAP_DEMUXER)               += sapdec.o
>  OBJS-$(CONFIG_SAP_MUXER)                 += sapenc.o
>  OBJS-$(CONFIG_SBG_DEMUXER)               += sbgdec.o
> diff --git a/libavformat/aadec.c b/libavformat/aadec.c
> new file mode 100644
> index 0000000..bd994e8
> --- /dev/null
> +++ b/libavformat/aadec.c
> @@ -0,0 +1,381 @@
> +/*
> + * Audible AA demuxer
> + * Copyright (c) 2015 Vesselin Bontchev
> + *

> + * Inspired by https://github.com/jteeuwen/audible, and
> + * https://code.google.com/p/pyaudibletags/source projects.

Inspired, or translated from Go to C? The readString() wrapper seems to
indicate a step-by-step translation, and that warrants copyright. The
licence is GPL-compatible, though, but attribution must be preserved.

> + *
> + * See https://en.wikipedia.org/wiki/Audible.com#Quality for
> + * format details.
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +
> +#include "avformat.h"
> +#include "internal.h"
> +#include "libavutil/intreadwrite.h"
> +#include "libavutil/tea.h"
> +#include "libavutil/fifo.h"
> +#include "libavutil/opt.h"
> +
> +#define AA_MAGIC 1469084982 /* this identifies an audible .aa file */
> +#define MAX_CODEC_SECOND_SIZE 3982
> +#define MAX_TOC_ENTRIES 64
> +#define TEA_BLOCK_SIZE 8
> +
> +typedef struct AADemuxContext {

> +    int64_t data_end;

Looks unused.

> +    uint32_t HeaderSeed;
> +    uint32_t Filesize;
> +    uint32_t Magic;
> +    uint32_t tocSize;
> +    uint32_t npairs;
> +    union {
> +        unsigned char key[16];
> +        uint32_t part[4];
> +    } HeaderKey;
> +    uint32_t TOC[MAX_TOC_ENTRIES][2];

These fields are used only in the read_header() function, they could
(should?) be local variables.

And some of them are stored but never accessed.

And TOC could be a structure.

> +
> +    void *aa_fixed_key;
> +    int aa_fixed_key_size;

> +    uint32_t start;
> +    uint32_t end;

Same as above: local.

> +    int32_t codec_second_size;

> +    char codec_name[64];

Local.

> +    struct AVTEA *tea_ctx;
> +    uint8_t final_key[16];
> +    int64_t current_chapter_size;
> +    int64_t current_codec_second_size;

> +    uint32_t trailing_bytes;

Looks unused.

> +    int64_t total_parsed;

Looks unused.

> +    struct AVFifoBuffer *cbuf;
> +    int chapter_idx;
> +} AADemuxContext;
> +

> +static int32_t GetSecondSizeByCodecID(char *codec_name)

Any reason to use UglyCamelCase for this?

> +{
> +    int32_t result = 0;
> +
> +    if (!strcmp(codec_name, "mp332")) {

> +        // 0xC00D (marker in file)

I do not understand the comment.

> +        result = 3982;
> +    } else if (!strcmp(codec_name, "acelp85")) {
> +        // 0xC00C
> +        result = 2000;
> +    } else if (!strcmp(codec_name, "acelp85")) {
> +        // 0xC010
> +        result = 1045;
> +    }
> +
> +    return result;
> +}
> +

> +static void readString(AVIOContext *pb, int size, char *output)
> +{
> +    avio_read(pb, output, size);
> +}

Error code ignored. And the usefulness of this wrapper is hard to see.

> +
> +static int aa_read_header(AVFormatContext *s)
> +{
> +    AVStream *st;
> +    AADemuxContext *c;
> +    AVIOContext *pb;
> +
> +    int i, j, idx;
> +    uint32_t nkey;
> +    uint32_t nval;
> +    char key[512] = {0};
> +    char val[512] = {0};
> +    unsigned char buffer[512] = {0};
> +    uint32_t v0, v1;
> +    unsigned char dst[8];
> +    unsigned char src[8];

> +    unsigned char *output = buffer;

Looks useless.

> +    int largest_idx = -1;
> +    int64_t largest_size = -1;
> +    int64_t current_size = -1;
> +

> +    c = s->priv_data;
> +    pb = s->pb;

You could init that directly while declaring the variables, that is how it
is done.

> +    c->tea_ctx = av_tea_alloc();
> +    if (!c->tea_ctx)
> +        return AVERROR(ENOMEM);
> +    c->cbuf = av_fifo_alloc(MAX_CODEC_SECOND_SIZE * 2);
> +    if (!c->cbuf)
> +        return AVERROR(ENOMEM);

Leaks.

> +
> +    c->Filesize = avio_rb32(pb); // file size
> +    c->Magic = avio_rb32(pb); // magic string
> +    c->tocSize = avio_rb32(pb); // TOC size
> +    avio_rb32(pb); // unidentified integer

> +    for (i = 0; i < c->tocSize; i++) { // read TOC
> +        avio_rb32(pb); // TOC entry index
> +        c->TOC[i][0] = avio_rb32(pb); // block Offset
> +        c->TOC[i][1] = avio_rb32(pb); // block size

Unsafe, tocSize needs to be validated.

> +    }
> +    avio_read(pb, buffer, 24); // header termination block (ignored)
> +    c->npairs = avio_rb32(pb); // read dictionary entries
> +    for (i = 0; i < c->npairs; i++) {
> +        memset(val, 0, sizeof(val));
> +        memset(key, 0, sizeof(key));
> +        avio_r8(pb); // unidentified byte
> +        nkey = avio_rb32(pb); // key string length
> +        nval = avio_rb32(pb); // value string length

> +        if (nkey > 512) {

sizeof(key)

> +            avio_seek(pb, nkey, SEEK_CUR);

avio_skip()

> +        } else {
> +            readString(pb, nkey, key); // key string
> +        }

> +        if (nval > 512) {
> +            avio_seek(pb, nval, SEEK_CUR);

Ditto.

> +        } else {
> +            readString(pb, nval, val); // value string.
> +        }

> +        if (!strncmp(key, "codec", 5)) {

This will match "codec_params" too, you need to take the length into
account. I suggest to add a 0-terminator to key and use a simple strcmp().

> +            strncpy(c->codec_name, val, sizeof(c->codec_name) - 1);
> +        }
> +        if (!strncmp(key, "HeaderSeed", 10)) {
> +            c->HeaderSeed = atoi(val);
> +        }
> +        if (!strncmp(key, "HeaderKey", 9)) {
> +            j = 0;
> +            for (idx = 0; idx < 4; idx++) {

> +                c->HeaderKey.part[idx] = (uint32_t)atoi(val +  j);
> +                c->HeaderKey.part[idx] = AV_RB32(c->HeaderKey.part + idx); // convert to BE!

Urgh!

	uint32_t key_part = atoi(...);
	AV_WB32(c->header_key + idx + 4, key_part);

> +                if (idx == 3)
> +                    break;
> +                while (val[j] != ' ') // find the next space
> +                    j++;
> +                j = j + 1; // skip over the space

You could do much simpler using sscanf():

    sscanf(val, "%d%d%d%d",
	   &key_part[0], &key_part[1], &key_part[2], &key_part[3]);

> +            }
> +        }
> +    }
> +
> +    /* decryption key derivation */
> +    av_tea_init(c->tea_ctx, c->aa_fixed_key, 16);

> +    c->codec_second_size = GetSecondSizeByCodecID(c->codec_name);
> +    if (c->codec_second_size == -1) {

The function never returns -1.

> +        return AVERROR_DECODER_NOT_FOUND;
> +    }
> +    memcpy(output + 0, "\x00\x00",    2); // purely for padding purposes
> +    memcpy(output + 2, &c->HeaderKey, 16);
> +    idx = 0;
> +    for (i = 0; i < 3 * TEA_BLOCK_SIZE; i += TEA_BLOCK_SIZE) {
> +        v0 = c->HeaderSeed;
> +        v1 = c->HeaderSeed + 1;
> +        AV_WB32(src, v0);
> +        AV_WB32(src + 4, v1);
> +        c->HeaderSeed = v1 + 1;

> +        av_tea_crypt(c->tea_ctx, dst, src, TEA_BLOCK_SIZE, NULL, 0); // TEA ECB encrypt
> +        for (j = 0; j < 8 && idx < 18; j+=1, idx+=1) {
> +            output[idx] = output[idx] ^ dst[j];
> +        }

Not entirely sure: is it not a re-implementation of CBC mode?

> +    }
> +    memcpy(c->final_key, output + 2, 16); // skip first 2 bytes of output
> +
> +    /* decoder setup */
> +    st = avformat_new_stream(s, NULL);
> +    if (!st)
> +        return AVERROR(ENOMEM);
> +    st->codec->codec_type = AVMEDIA_TYPE_AUDIO;
> +    if (!strcmp(c->codec_name, "mp332")) {
> +        st->codec->codec_id = AV_CODEC_ID_MP3;
> +    } else if (!strcmp(c->codec_name, "acelp85")) {
> +        st->codec->codec_id = AV_CODEC_ID_SIPR;
> +        st->codec->block_align = 19;
> +        st->codec->channels = 1;
> +        st->codec->sample_rate = 8500;
> +    } else if (!strcmp(c->codec_name, "acelp16")) {
> +        st->codec->codec_id = AV_CODEC_ID_SIPR;
> +        st->codec->block_align = 20;
> +        st->codec->channels = 1;
> +        st->codec->sample_rate = 16000;
> +    } else {
> +        return AVERROR_DECODER_NOT_FOUND;
> +    }
> +
> +    /* determine, and jump to audio start offset */
> +    for (i = 1; i < c->tocSize; i++) { // skip the first entry!
> +        current_size = c->TOC[i][1];
> +        if (current_size > largest_size) {
> +            largest_idx = i;
> +            largest_size = current_size;
> +        }
> +    }
> +    c->start = c->TOC[largest_idx][0];
> +    c->end = c->TOC[largest_idx][1];
> +    avio_seek(pb, c->start, SEEK_SET);
> +    c->current_chapter_size = 0;
> +
> +    return 0;
> +}
> +
> +static int aa_read_packet(AVFormatContext *s, AVPacket *pkt)
> +{
> +    AADemuxContext *c;
> +    unsigned char dst[8];
> +    unsigned char src[8];
> +    int i;
> +    int trailing_bytes;
> +    int blocks;
> +    uint8_t buf[MAX_CODEC_SECOND_SIZE * 2];
> +    int written = 0;
> +    int queued_size;
> +    int ret = 0;
> +    int shortfall;
> +    int extra;
> +
> +    c = s->priv_data;
> +
> +    // do we already have some audio ouput from the last invocation?
> +    queued_size = av_fifo_size(c->cbuf);
> +    if (queued_size) {
> +        // av_log(s, AV_LOG_DEBUG, "[aa] draining queued data (%d bytes)\n", queued_size);
> +        av_fifo_generic_read(c->cbuf, buf, queued_size, NULL);
> +        written = written + queued_size;
> +    }
> +
> +    // are we at the start of a chapter?
> +    if (c->current_chapter_size == 0) {
> +        c->current_chapter_size = avio_rb32(s->pb);
> +        if (c->current_chapter_size == 0) {
> +            ret = AVERROR_EOF;
> +            goto end;
> +        }
> +        av_log(s, AV_LOG_DEBUG, "[aa] chapter %d (%ld bytes)\n", c->chapter_idx, c->current_chapter_size);
> +        c->chapter_idx = c->chapter_idx + 1;
> +        avio_rb32(s->pb); // data start offset

> +        c-> current_codec_second_size = c->codec_second_size;

Nit: strange space after ->.

> +    }
> +
> +    // is this the last block in this chapter?
> +    if (c->current_chapter_size / c->current_codec_second_size == 0) {
> +        c->current_codec_second_size = c->current_chapter_size % c->current_codec_second_size;
> +    }
> +
> +    // decrypt c->current_codec_second_size bytes
> +    blocks = c->current_codec_second_size / TEA_BLOCK_SIZE;
> +    for (i = 0; i < blocks; i++) {
> +        avio_read(s->pb, src, TEA_BLOCK_SIZE);
> +        av_tea_init(c->tea_ctx, c->final_key, 16);
> +        av_tea_crypt(c->tea_ctx, dst, src, 1, NULL, 1);
> +        memcpy(buf + written, dst, TEA_BLOCK_SIZE);
> +        written = written + TEA_BLOCK_SIZE;
> +    }
> +    trailing_bytes = c->current_codec_second_size % TEA_BLOCK_SIZE;
> +    if (trailing_bytes != 0) { // trailing bytes are left unencrypted!

> +        avio_read(s->pb, src, trailing_bytes);
> +        memcpy(buf + written, dst, trailing_bytes);

Looks strange: you read to src but copy dst. May this be the cause of the
glitches?

> +        written = written + trailing_bytes;
> +    }
> +
> +    // update state
> +    c->current_chapter_size = c->current_chapter_size - c->current_codec_second_size;
> +    if (c->current_chapter_size <= 0)
> +        c->current_chapter_size = 0;
> +
> +    if (written < c->codec_second_size) { // don't have enough output for the decoder, pull in data from the next chapter
> +        shortfall = c->codec_second_size - written;
> +        c->current_chapter_size = avio_rb32(s->pb);
> +        if (c->current_chapter_size == 0) {
> +            ret = AVERROR_EOF;
> +            goto end;
> +        }
> +        av_log(s, AV_LOG_DEBUG, "[aa] chapter %d (%ld bytes)\n", c->chapter_idx, c->current_chapter_size);
> +        c->chapter_idx = c->chapter_idx + 1;

> +        avio_rb32(s->pb); // data start offset (ignored)
> +        c->current_codec_second_size = c->codec_second_size;
> +        blocks = c->current_codec_second_size / TEA_BLOCK_SIZE;
> +        for (i = 0; i < blocks; i++) {
> +            avio_read(s->pb, src, 8);
> +            av_tea_init(c->tea_ctx, c->final_key, 16);
> +            av_tea_crypt(c->tea_ctx, dst, src, 1, NULL, 1);
> +            av_fifo_generic_write(c->cbuf, dst, TEA_BLOCK_SIZE, NULL);
> +        }
> +        trailing_bytes = c->current_codec_second_size % TEA_BLOCK_SIZE;
> +        if (trailing_bytes != 0) {
> +            avio_read(s->pb, src, trailing_bytes);
> +            av_fifo_generic_write(c->cbuf, src, trailing_bytes, NULL);
> +        }

This code looks duplicated.

> +        c->current_chapter_size = c->current_chapter_size - c->current_codec_second_size;
> +        if (c->current_chapter_size <= 0)
> +            c->current_chapter_size = 0;
> +
> +        // borrow "shortfall" bytes
> +        av_fifo_generic_read(c->cbuf, buf + written, shortfall, NULL);
> +        written = written + shortfall;
> +    }
> +
> +    // give only codec_second_size bytes to the decoder
> +    if (written > c->codec_second_size) { // c->cbuf drain triggers this
> +        extra = written - c->codec_second_size;
> +        av_fifo_generic_write(c->cbuf, buf + c->codec_second_size, extra, NULL);
> +        written = c->codec_second_size;
> +    }
> +
> +    av_init_packet(pkt);
> +    pkt->data = NULL;
> +    pkt->size = 0;
> +    pkt->pos = avio_tell(s->pb);
> +
> +    // av_log(s, AV_LOG_DEBUG, "[aa] giving %d bytes to decoder\n", written);
> +    av_grow_packet(pkt, written);

IIRC, av_new_packet() can do that in a single step.

And do not forget to check memory allocations.

> +    memcpy(pkt->data, buf, written);
> +
> +end:
> +    return ret;
> +}
> +
> +static int aa_probe(AVProbeData *p)
> +{
> +    unsigned char *buf = p->buf;
> +

> +    // first 4 bytes are file size, next 4 bytes are the magic
> +    if (AV_RB32(buf+4) != AA_MAGIC)
> +        return AVPROBE_SCORE_EXTENSION;
> +
> +    return AVPROBE_SCORE_MAX;

Looks very wrong: 32-bits magic number -> MAX, anything else -> EXTENSION.
Should be: 32->bits magic number -> MAX/4, anything -> 0.

> +}
> +
> +#define OFFSET(x) offsetof(AADemuxContext, x)
> +#define FLAGS AV_OPT_FLAG_DECODING_PARAM
> +static const AVOption aa_options[] = {
> +    { "aa_fixed_key", // extracted from libAAX_SDK.so and AAXSDKWin.dll files!
> +        "Fixed key used for handling Audible AA files", OFFSET(aa_fixed_key),
> +        AV_OPT_TYPE_BINARY, {.str="77214d4b196a87cd520045fd2a51d673"},
> +        .flags = AV_OPT_FLAG_DECODING_PARAM },
> +    { NULL },
> +};
> +
> +static const AVClass aa_class = {
> +    .class_name = "aa",
> +    .item_name  = av_default_item_name,
> +    .option     = aa_options,
> +    .version    = LIBAVUTIL_VERSION_INT,
> +};
> +
> +AVInputFormat ff_aa_demuxer = {
> +    .name           = "aa",
> +    .long_name      = NULL_IF_CONFIG_SMALL("Audible AA format files"),
> +    .priv_class     = &aa_class,
> +    .priv_data_size = sizeof(AADemuxContext),
> +    .extensions     = "aa",
> +    .read_probe     = aa_probe,
> +    .read_header    = aa_read_header,
> +    .read_packet    = aa_read_packet,
> +    .flags          = AVFMT_GENERIC_INDEX,
> +};
> diff --git a/libavformat/allformats.c b/libavformat/allformats.c
> index 181cb9e..60ec0ca 100644
> --- a/libavformat/allformats.c
> +++ b/libavformat/allformats.c
> @@ -262,6 +262,7 @@ void av_register_all(void)
>      REGISTER_MUXER   (RTP_MPEGTS,       rtp_mpegts);
>      REGISTER_MUXDEMUX(RTSP,             rtsp);
>      REGISTER_DEMUXER (SAMI,             sami);

> +    REGISTER_DEMUXER (AA,               aa);

Alphabetical order.

>      REGISTER_MUXDEMUX(SAP,              sap);
>      REGISTER_DEMUXER (SBG,              sbg);
>      REGISTER_DEMUXER (SDP,              sdp);

Regards,

-- 
  Nicolas George
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150727/4629c6e4/attachment.sig>