[FFmpeg-devel] [PATCH] avcodec/alac: also use a temp buffer for 24bit samples

Tue Oct 6 22:33:42 CEST 2015

On 10/6/15, James Almer <jamrial at gmail.com> wrote:
> On 10/6/2015 4:40 PM, Paul B Mahol wrote:
>> On 10/6/15, James Almer <jamrial at gmail.com> wrote:
>>> Since AVFrame.extended_data is apparently not padded, simd functions
>>> could in some cases overread, so make the decoder use a temp buffer
>>> unconditionally.
>>>
>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>> ---
>>>  libavcodec/alac.c | 18 +++++-------------
>>>  1 file changed, 5 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/libavcodec/alac.c b/libavcodec/alac.c
>>> index 146668e..394bd19 100644
>>> --- a/libavcodec/alac.c
>>> +++ b/libavcodec/alac.c
>>> @@ -80,7 +80,6 @@ typedef struct ALACContext {
>>>      int extra_bits;     /**< number of extra bits beyond 16-bit */
>>>      int nb_samples;     /**< number of samples in the current frame */
>>>
>>> -    int direct_output;
>>>      int extra_bit_bug;
>>>
>>>      ALACDSPContext dsp;
>>> @@ -278,10 +277,6 @@ static int decode_element(AVCodecContext *avctx,
>>> AVFrame *frame, int ch_index,
>>>          return AVERROR_INVALIDDATA;
>>>      }
>>>      alac->nb_samples = output_samples;
>>> -    if (alac->direct_output) {
>>> -        for (ch = 0; ch < channels; ch++)
>>> -            alac->output_samples_buffer[ch] = (int32_t
>>> *)frame->extended_data[ch_index + ch];
>>> -    }
>>>
>>>      if (is_compressed) {
>>>          int16_t lpc_coefs[2][32];
>>> @@ -393,8 +388,9 @@ static int decode_element(AVCodecContext *avctx,
>>> AVFrame
>>> *frame, int ch_index,
>>>          break;
>>>      case 24: {
>>>          for (ch = 0; ch < channels; ch++) {
>>> +            int32_t *outbuffer = (int32_t
>>> *)frame->extended_data[ch_index +
>>> ch];
>>>              for (i = 0; i < alac->nb_samples; i++)
>>> -                alac->output_samples_buffer[ch][i] <<= 8;
>>> +                *outbuffer++ = alac->output_samples_buffer[ch][i] << 8;
>>>          }}
>>>          break;
>>>      }
>>> @@ -468,8 +464,7 @@ static av_cold int alac_decode_close(AVCodecContext
>>> *avctx)
>>>      int ch;
>>>      for (ch = 0; ch < FFMIN(alac->channels, 2); ch++) {
>>>          av_freep(&alac->predict_error_buffer[ch]);
>>> -        if (!alac->direct_output)
>>> -            av_freep(&alac->output_samples_buffer[ch]);
>>> +        av_freep(&alac->output_samples_buffer[ch]);
>>>          av_freep(&alac->extra_bits_buffer[ch]);
>>>      }
>>>
>>> @@ -491,11 +486,8 @@ static int allocate_buffers(ALACContext *alac)
>>>          FF_ALLOC_OR_GOTO(alac->avctx, alac->predict_error_buffer[ch],
>>>                           buf_size, buf_alloc_fail);
>>>
>>> -        alac->direct_output = alac->sample_size > 16;
>>> -        if (!alac->direct_output) {
>>> -            FF_ALLOC_OR_GOTO(alac->avctx,
>>> alac->output_samples_buffer[ch],
>>> -                             buf_size, buf_alloc_fail);
>>> -        }
>>> +        FF_ALLOC_OR_GOTO(alac->avctx, alac->output_samples_buffer[ch],
>>> +                         buf_size, buf_alloc_fail);
>>>
>>>          FF_ALLOC_OR_GOTO(alac->avctx, alac->extra_bits_buffer[ch],
>>>                           buf_size, buf_alloc_fail);
>>> --
>>> 2.5.2
>>>
>>> _______________________________________________
>>> ffmpeg-devel mailing list
>>> ffmpeg-devel at ffmpeg.org
>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>>
>>
>> it should be padded and not introduce slowdown
>
> If you mean the temp buffers, they will be padded alongside the simd
> functions
> once i commit them.
> But If you mean the avframe.extended_data buffer, could you take care of
> that?
> I'm not familiar enough with avframe to change the relevant alloc functions.
>
> running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -f null -"
> (implicit
> pcm_s16le output)
>
> Before
> real    0m0.596s
> user    0m0.000s
> sys     0m0.000s
>
> After
> real    0m0.575s
> user    0m0.000s
> sys     0m0.000s
>
>
> running "time ffmpeg -v 0 -threads 1 -i INPUT -threads 1 -c:a pcm_s24le -f
> null -"
>
> Before
> real    0m0.618s
> user    0m0.000s
> sys     0m0.000s
>
> After
> real    0m0.618s
> user    0m0.000s
> sys     0m0.000s
>
> With a ~1 minute 24 bit 44.1kh stereo sample. Curious that it's faster when
> the
> output is s16.
> You'll probably have to do the same for the tak decoder before you commit
> your
> decorrelate simd patch, btw. It also uses avframe.extended_data buffer
> directly
> for 24bit samples.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

you set aligned number of samples before calling get_buffer and after
that changes
frame->nb_samples to actual number of samples.

Alternatively IIRC default 16 byte alignment could be increased.