[FFmpeg-devel] [PATCH] OSQ lossless audio format support

Fri Aug 25 19:42:21 EEST 2023

On 8/25/2023 1:28 PM, Paul B Mahol wrote:
> On Fri, Aug 25, 2023 at 5:57 PM James Almer <jamrial at gmail.com> wrote:
> 
>> On 8/24/2023 7:06 PM, Paul B Mahol wrote:
>>> On Thu, Aug 24, 2023 at 11:51 PM James Almer <jamrial at gmail.com> wrote:
>>>
>>>> On 8/24/2023 6:11 PM, Paul B Mahol wrote:
>>>>> On Thu, Aug 24, 2023 at 11:00 PM James Almer <jamrial at gmail.com>
>> wrote:
>>>>>
>>>>>> On 8/24/2023 6:52 AM, Paul B Mahol wrote:
>>>>>>> +static int osq_receive_frame(AVCodecContext *avctx, AVFrame *frame)
>>>>>>> +{
>>>>>>> +    OSQContext *s = avctx->priv_data;
>>>>>>> +    GetBitContext *gb = &s->gb;
>>>>>>> +    int ret, n;
>>>>>>> +
>>>>>>> +    while (s->bitstream_size < s->max_framesize) {
>>>>>>> +        int size;
>>>>>>> +
>>>>>>> +        if (!s->pkt->data) {
>>>>>>> +            ret = ff_decode_get_packet(avctx, s->pkt);
>>>>>>> +            if (ret == AVERROR_EOF && s->bitstream_size > 0)
>>>>>>> +                break;
>>>>>>> +            if (ret < 0)
>>>>>>> +                return ret;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        size = FFMIN(s->pkt->size - s->pkt_offset, s->max_framesize
>> -
>>>>>> s->bitstream_size);
>>>>>>> +        memcpy(s->bitstream + s->bitstream_size, s->pkt->data +
>>>>>> s->pkt_offset, size);
>>>>>>> +        s->bitstream_size += size;
>>>>>>> +        s->pkt_offset += size;
>>>>>>> +
>>>>>>> +        if (s->pkt_offset == s->pkt->size) {
>>>>>>> +            av_packet_unref(s->pkt);
>>>>>>> +            s->pkt_offset = 0;
>>>>>>> +        }
>>>>>>
>>>>>> This looks like you're assembling a packet of max_framesize bytes. You
>>>>>> should instead do that in a parser, and ensure here that the packets
>> fed
>>>>>> to this decoder are <= max_framesize.
>>>>>>
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    frame->nb_samples = FFMIN(s->frame_samples, s->nb_samples);
>>>>>>> +    if (frame->nb_samples <= 0)
>>>>>>> +        return AVERROR_EOF;
>>>>>>> +
>>>>>>> +    if ((ret = ff_get_buffer(avctx, frame, 0)) < 0)
>>>>>>> +        goto fail;
>>>>>>> +
>>>>>>> +    if ((ret = init_get_bits8(gb, s->bitstream, s->bitstream_size))
>> <
>>>> 0)
>>>>>>> +        goto fail;
>>>>>>> +
>>>>>>> +    if ((ret = osq_decode_block(avctx, frame)) < 0)
>>>>>>> +        goto fail;
>>>>>>> +
>>>>>>> +    s->nb_samples -= frame->nb_samples;
>>>>>>> +
>>>>>>> +    n = get_bits_count(gb) / 8;
>>>>>>> +    if (n > s->bitstream_size) {
>>>>>>> +        ret = AVERROR_INVALIDDATA;
>>>>>>> +        goto fail;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    memmove(s->bitstream, &s->bitstream[n], s->bitstream_size - n);
>>>>>>> +    s->bitstream_size -= n;
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +
>>>>>>> +fail:
>>>>>>> +    s->bitstream_size = 0;
>>>>>>> +    s->pkt_offset = 0;
>>>>>>> +    av_packet_unref(s->pkt);
>>>>>>> +
>>>>>>> +    return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +const AVInputFormat ff_osq_demuxer = {
>>>>>>> +    .name           = "osq",
>>>>>>> +    .long_name      = NULL_IF_CONFIG_SMALL("raw OSQ"),
>>>>>>> +    .read_probe     = osq_probe,
>>>>>>> +    .read_header    = osq_read_header,
>>>>>>> +    .read_packet    = ff_raw_read_partial_packet,
>>>>>>
>>>>>> Instead of sending an arbitrarily sized packet (1024 bytes as of now),
>>>>>> you should set codecpar->frame_size and propagate packets with that
>>>>>> amount of bytes instead.
>>>>>> A parser is still needed, though, for non seekable input (a pipe). And
>>>>>> in case the decoder is fed with non lavf input.
>>>>>>
>>>>>
>>>>> Format is not seekable, packet sizes are nowhere stored in .osq files.
>>>>
>>>> Same case as raw formats like H.264. No packet sizes, no seekability
>>>> without having parsed the entire sequence to build a list of seek
>>>> points, etc. A parser has to assemble access units.
>>>>
>>>
>>> But there is no any info how to assemble anything here, unless
>>> you propose decoding inside parser?
>>>
>>>
>>>
>>>>
>>>>>
>>>>> Think of this format like APAC/BONK/WAVARC but just no need to keep
>>>> unused
>>>>> bits from previous decoded data/packet.
>>>>> With this fact, parser makes no sense to do.
>>>>
>>>> You know that frame_size is (frame_samples_per_channel * 16 + 1024) *
>>>> channels. A parser can work with that (Look at gsm parser for example.
>>>> There may be others too).
>>>>
>>>
>>> There is reason why variable is called max_framesize.
>>>
>>> You should not propagate truncated data in 1024 byte chunks, and the
>>>> decoder should not expect that either.
>>>>
>>>
>>> Nothing gets truncated. That is just worst case scenario size for packet.
>>> In another words, size of packet can be anything between 33 and
>>> max_framesize.
>>
>> By truncated i mean the demuxer propagates arbitrary 1024 byte sized
>> packets. And in the decoder you're assembling a buffer of max_framesize
>> in osq_receive_frame() by requesting packets until you reach that size
>> or EOF, whatever comes first, before calling osq_decode_block(). So
>> unless you hit EOF, it will always be max_framesize. That's what i say
>> should be done in a simple, small parser.
>>
> 
> I fully understand you, but your reasoning is critically flawed.
> And I have no time to educated an uneducated.

Not insulting people will help your arguments.

> 
> For The Last Time:
> 
> Even if parser does that what you propose, decoder would need to keep own
> internal
> buffer again and than memmove() bytes around.
> 
> And even if parser is implemented that would effectively break other
> possible containers different
> than .osq - such containers would properly split data into packets and my
> decoder would still work in such scenario but
> your proposed way would only force such containers to blacklist your
> proposed parser.

You know a parser is doable and the proper approach. With frame_samples 
and the first part of osq_decode_block() you can get it going. But do 
whatever you want.