[FFmpeg-devel] [PATCH] E-AC-3 spectral extension

Sun May 31 05:43:12 CEST 2009

Michael Niedermayer wrote:
> On Sun, May 17, 2009 at 02:23:34PM -0400, Justin Ruggles wrote:
>> Hi,
>>
>> I was recently made aware that some French TV station(s) will soon (if
>> not already) start using E-AC-3 streams in their broadcasts which
>> utilize spectral extension.  I was also given some samples (thanks j-b
>> and Anthony), which I uploaded to mphq:
>> http://samples.mplayerhq.hu/A-codecs/AC3/eac3/csi_miami_*
>>
>> So I decided to revisit my SPX patch.  The previous version was done
>> with all integer arithmetic, but it turns out that it's really not
>> accurate enough for spectal extension processing.  The resulting decoded
>> output had a max bandwidth of about 2kHz less when using 24-bit fixed
>> point vs. floating point, and was only slightly higher than without any
>> SPX processing at all.  Making just the square roots floating point
>> raised the bandwidth about 1kHz, and making the rest (noise/signal
>> scaling, spx coords, and notch filters) floating point added about
>> another 1kHz.
>>
>> I was able to compare the output to Nero's E-AC-3 decoder (thanks
>> madshi), and the results are very close considering that AC-3 uses
>> random noise for zero-bit mantissas:
> 
>> stddev:  131.16 PSNR: 53.96
> 
> i wouldnt call 131.16 close

Well, considering I don't know how the Nero decoder differs, it's not
bad.  I don't know how the Nero decoder ends up with higher bandwidth
than it should, it very likely uses a different random noise generator,
and it could do dithering in the float-to-int16 conversion.

>> PEAQ ODG: -0.44
> 
> what is PEAQ ODG ?

PEAQ is an ITU standard for perceptual evaluation of audio quality.  ODG
is the objective difference grade.  It tries to objectively estimate
what results might be from a subjective listening test by using a
psychoacoustic model.  The method has its flaws, but it's a heck of a
lot simpler than setting up a multi-user double-blind listening test for
each change.

 0 = Imperceptible
-1 = Perceptible, but not annoying
-2 = Slightly annoying
-3 = Annoying
-4 = Very annoying

> btw, have you tested your code with our trasher / some fuzzer to make sure
> it doesnt segfault?

Yes, and it does not segfault even with -er 0.  The values read from the
stream which affect reading/writing from memory are bounds checked.  The
(E)AC-3 decoder already does fairly well with damaged streams, and that
is no different after this change.

> 
>> One thing I'm unsure about is whether I should add optional runtime
>> generation of the attenuation table rather than always hardcoding it.
> 
> i think due to the relatively small size there is little point

ok.

> 
> [...]
> 
>> diff --git a/libavcodec/ac3dec.c b/libavcodec/ac3dec.c
>> index c176cb3..e6d7a9d 100644
>> --- a/libavcodec/ac3dec.c
>> +++ b/libavcodec/ac3dec.c
>> @@ -825,14 +825,94 @@ static int decode_audio_block(AC3DecodeContext *s, int blk)
>>  
>>      /* spectral extension strategy */
>>      if (s->eac3 && (!blk || get_bits1(gbc))) {
>> -        if (get_bits1(gbc)) {
>> -            ff_log_missing_feature(s->avctx, "Spectral extension", 1);
>> -            return -1;
>> +        s->spx_in_use = get_bits1(gbc);
>> +        if (s->spx_in_use) {
>> +            int begf, endf;
>> +            int spx_end_subband;
>> +
>> +            /* determine which channels use spx */
>> +            if (s->channel_mode == AC3_CHMODE_MONO) {
>> +                s->channel_in_spx[1] = 1;
>> +            } else {
>> +                for (ch = 1; ch <= fbw_channels; ch++)
>> +                    s->channel_in_spx[ch] = get_bits1(gbc);
>> +            }
>> +
>> +            s->spx_copy_start_freq = get_bits(gbc, 2) * 12 + 25;
>> +            begf = get_bits(gbc, 3);
>> +            endf = get_bits(gbc, 3);
>> +            s->spx_start_subband = begf < 6 ? begf+2 : 2*begf-3;
>> +            spx_end_subband      = endf < 4 ? endf+5 : 2*endf+3;
>> +            if (s->spx_start_subband >= spx_end_subband) {
>> +                av_log(s->avctx, AV_LOG_ERROR, "invalid spectral extension range (%d >= %d)\n",
>> +                       s->spx_start_subband, spx_end_subband);
>> +                return -1;
>> +            }
>> +            s->spx_start_freq    = s->spx_start_subband * 12 + 25;
>> +            s->spx_end_freq      = spx_end_subband      * 12 + 25;
>> +            if (s->spx_copy_start_freq >= s->spx_start_freq) {
>> +                av_log(s->avctx, AV_LOG_ERROR, "invalid spectral extension copy start bin (%d >= %d)\n",
>> +                       s->spx_copy_start_freq, s->spx_start_freq);
>> +                return -1;
>> +            }
> 
> you know, i always have a bad feeling when various variables are updated
> first and checked afterwards but left in an invalid state anyway
> 
> are you sure this is all free of buffer overflows? (ive not checked so
> it may very well be ok ...)

No further blocks are read in the frame after a block decode fails.
Each frame is independent, so the next frame is not affected by an
invalid state.  Also, it was previously discussed and agreed upon that
trying to read subsequent blocks after a failed block is pointless since
there are no known streams which use the block start info.

>> +                        nblend = sqrt(       nratio);
>> +                        sblend = sqrt(1.0f - nratio);
>> +                        nblend *= 1.73205077648f; // scale noise to give unity variance
> 
> nblend = sqrt( 3*nratio);

oh yeah, I forgot that was sqrt(3).

> 
> [...]
>> +        /* Copy coeffs from normal bands to extension bands */
>> +        bin = s->spx_start_freq;
>> +        for (i = 0; i < num_copy_sections; i++) {
>> +            memcpy(&s->transform_coeffs[ch][bin],
>> +                   &s->transform_coeffs[ch][s->spx_copy_start_freq],
>> +                   copy_sizes[i]*sizeof(float));
>> +            bin += copy_sizes[i];
>> +        }
> 
> cant that memcpy be merged with some of the other processing?

The only possibility I can see would be to store the copy start
frequency for each band instead of having copy sections, then merge it
with the energy calculation.  I'm not sure if there would be any benefit
in this, but I'll try it.

I will change the other things you mentioned.

Thanks,
Justin