[FFmpeg-devel] [PATCH] speex in ogg muxer

Sun Sep 6 02:20:03 CEST 2009

Justin Ruggles wrote:

> Justin Ruggles wrote:
> 
>> Justin Ruggles wrote:
>>
>>> Justin Ruggles wrote:
>>>
>>>> Justin Ruggles wrote:
>>>>
>>>>> Baptiste Coudurier wrote:
>>>>>> Justin Ruggles wrote:
>>>>>>> Baptiste Coudurier wrote:
>>>>>>>> Hi Justin,
>>>>>>>>
>>>>>>>> Justin Ruggles wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This patch adds speex support to the ogg muxer.  It basically does the
>>>>>>>>> same thing as Ogg/FLAC, in that the 1st packet is a global header from
>>>>>>>>> extradata and the 2nd packet is vorbiscomment metadata.
>>>>>>>>>
>>>>>>>>> This seems to work just fine for speex-to-speex stream copy, but
>>>>>>>>> probably would not work for flv-to-speex because flv doesn't to have any
>>>>>>>>> speex extradata from what I can tell.  I guess a header could be
>>>>>>>>> constructed, but that would be a separate patch to the flv demuxer.
>>>>>>>>>
>>>>>>>>> This patch is a precursor to libspeex encoding support, which I'll be
>>>>>>>>> sending shortly.
>>>>>>>>>
>>>>>>>>> -Justin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> Index: libavformat/oggenc.c
>>>>>>>>> ===================================================================
>>>>>>>>> --- libavformat/oggenc.c	(revision 19244)
>>>>>>>>> +++ libavformat/oggenc.c	(working copy)
>>>>>>>>> @@ -104,17 +125,39 @@
>>>>>>>>>      bytestream_put_byte(&p, 0x00); // streaminfo
>>>>>>>>>      bytestream_put_be24(&p, 34);
>>>>>>>>>      bytestream_put_buffer(&p, streaminfo, FLAC_STREAMINFO_SIZE);
>>>>>>>>> -    oggstream->header_len[1] = 1+3+4+strlen(vendor)+4;
>>>>>>>>> -    oggstream->header[1] = av_mallocz(oggstream->header_len[1]);
>>>>>>>>> -    p = oggstream->header[1];
>>>>>>>>> +    p = ogg_write_vorbiscomment(4, bitexact, &oggstream->header_len[1]);
>>>>>>>>> +    if (!p)
>>>>>>>>> +        return -1;
>>>>>>>> AVERROR(ENOMEM)
>>>>>>> fixed.
>>>>>>>
>>>>>>>>> @@ -144,6 +188,12 @@
>>>>>>>>>                  av_log(s, AV_LOG_ERROR, "Extradata corrupted\n");
>>>>>>>>>                  av_freep(&st->priv_data);
>>>>>>>>>              }
>>>>>>>>> +        } else if (st->codec->codec_id == CODEC_ID_SPEEX) {
>>>>>>>>> +            if (ogg_build_speex_headers(st->codec, oggstream,
>>>>>>>>> +                                        st->codec->flags & CODEC_FLAG_BITEXACT) < 0) {
>>>>>>>>> +                av_log(s, AV_LOG_ERROR, "error writing Speex headers\n");
>>>>>>>>> +                av_freep(&st->priv_data);
>>>>>>>>> +            }
>>>>>>>> return error here with the return code of the func :>
>>>>>>>> Yes, it seems flac miss it too, this needs a fix.
>>>>>>>>
>>>>>>>> patch fine otherwise, maybe a micro bump for avformat would be nice.
>>>>>>> fixed. new patch attached. the new patch also differs in that it
>>>>>>> overrides the extra_headers field in the Speex header to be 0 since only
>>>>>>> the 2 required headers are written.
>>>>>>>
>>>>>> patch ok if it works :>
>>>> Ok, back to square one.
>>>>
>>>>> Hmm... I've done several more tests and it does not quite work as-is for
>>>>> all samples.  Here is what I have run into.  The tests so far are for
>>>>> ogg-to-ogg stream copy.
>>>>>
>>>>> - When the source has more than 1 frame per packet, the resulting copy
>>>>> plays fine with ffmpeg/ffplay but is quick and choppy with speexdec.  I
>>>>> was able to fix this by modifying the ogg/speex demuxer to set
>>>>> avctx->frame_size to the number of samples in a packet instead of in a
>>>>> frame.  I also had to update the libspeex decoder accordingly.  Maybe
>>>>> this is the wrong way to go about it though.  I'm guessing it is a
>>>>> timestamp/granulepos issue, but I don't know enough about Ogg to tell
>>>>> more than that.
>>>> This is now corrected after much discussion. :)
>>>>
>>>>> - Even with the fix and even with 1 frame per packet, 2 short samples
>>>>> I've tested so far have a single soft pop when the stream-copied file is
>>>>> decoded with speexdec, but it's fine with ffmpeg/ffplay.
>>>>>
>>>>> Maybe someone else might have an idea of what could be going wrong?
>>>> Now I think I know what is going wrong, and there is nothing we can do
>>>> about it I think.  speexenc does some weird things with granule
>>>> positions.  It starts out for a long time with granulepos=0 even though
>>>> it is encoding audio, then when it starts writing granule positions it
>>>> is not always in sync with the start of the stream.  Below is a little
>>>> snippet from a comparison of an original spx file to a copied spx file.
>>>>  Each packet should be 320 samples.
>>>>
>>>> [...]
>>>>
>>>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 57
>>>> +00:00:01.120: serialno 0000000000, granulepos 17920, packetno 57
>>>>
>>>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 58
>>>> +00:00:01.140: serialno 0000000000, granulepos 18240, packetno 58
>>>>
>>>> -00:00:00.000: serialno 1626088319, calc. gpos 0, packetno 59
>>>> +00:00:01.160: serialno 0000000000, granulepos 18560, packetno 59
>>>>
>>>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>>>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>>>>
>>>> -00:00:01.191: serialno 1626088319, calc. gpos 19057, packetno 61
>>>> +00:00:01.191: serialno 0000000000, granulepos 19057, packetno 61
>>>>
>>>> -00:00:01.211: serialno 1626088319, calc. gpos 19377, packetno 62
>>>> +00:00:01.211: serialno 0000000000, granulepos 19377, packetno 62
>>> So... I figured it out, but you may not want to know the answer. ;)
>>>
>>> The granulepos of the first packet is supposed to be interpreted as
>>> smaller than the full frame size by calculating what the granulepos of
>>> the first page would normally be, then subtracting it from what it
>>> really is to get the delay.
>>>
>>>> >From above, this is the last packet in the first page. There are 59
>>> packets per page in this stream (the first 2 packets are headers, hence
>>> the packetno of 60).
>>>> -00:00:01.171: serialno 1626088319, granulepos 18737, packetno 60
>>>> +00:00:01.180: serialno 0000000000, granulepos 18880, packetno 60
>>> speexdec interprets the first packet as having a delay of
>>> 18880-18737=143 samples.  So the first packet should be 320-143=177
>>> samples long, and the decoder discards the first 143 samples of the
>>> first frame.
>>>
>>> None of this is documented except for in the speexenc and speexdec
>>> source code.  From analyzing a Speex-in-FLV sample, it appears that the
>>> way Adobe handles this in Flash Media Server is to do like our ogg
>>> demuxer does and interpret the first page as if each frame is 320
>>> samples, then resync timestamps with the source after the first page,
>>> causing a skip in timestamps after the first page instead of at the
>>> beginning of the stream.
>>>
>>> I'm still not sure what to do about this though...
>> This patch makes it so that all the pts and durations are correct for
>> Ogg/Speex.  It basically just changes the durations of the first and
>> last packets.
> 
> nevermind. this doesn't quite work. i'm still working on it. damn ogg
> and its craziness!

Ok, now this patch should work correctly.

-Justin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: speex_granulepos_delay_2.patch
Type: text/x-diff
Size: 3767 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090905/cceec834/attachment.patch>