[FFmpeg-devel] Allow interrupt callback for AVCodecContext

Mon Jan 6 14:44:01 CET 2014

Hi,

On Mon, Dec 16, 2013 at 1:21 AM, Don Moir <donmoir at comcast.net> wrote:

> ----- Original Message ----- From: "Don Moir" <donmoir at comcast.net>
>> To: "FFmpeg development discussions and patches" <ffmpeg-devel at ffmpeg.org
>> >
>> Sent: Monday, December 16, 2013 2:03 AM
>> Subject: Re: [FFmpeg-devel] Allow interrupt callback for AVCodecContext
>>
>>
>>
>>> ----- Original Message ----- From: "Ronald S. Bultje" <
>>> rsbultje at gmail.com>
>>> To: "FFmpeg development discussions and patches" <
>>> ffmpeg-devel at ffmpeg.org>
>>> Sent: Wednesday, January 01, 2014 11:14 AM
>>> Subject: Re: [FFmpeg-devel] Allow interrupt callback for AVCodecContext
>>>
>>>
>>>  Hi,
>>>>
>>>> On Mon, Dec 16, 2013 at 1:30 AM, Don Moir <donmoir at comcast.net> wrote:
>>>>
>>>>
>>>>> ----- Original Message ----- From: "Ronald S. Bultje" <
>>>>> rsbultje at gmail.com>
>>>>> To: "FFmpeg development discussions and patches" <
>>>>> ffmpeg-devel at ffmpeg.org>
>>>>> Sent: Wednesday, January 01, 2014 10:52 AM
>>>>> Subject: Re: [FFmpeg-devel] Allow interrupt callback for AVCodecContext
>>>>>
>>>>>
>>>>>
>>>>>  Hi,
>>>>>
>>>>>>
>>>>>> On Mon, Dec 16, 2013 at 2:07 AM, Don Moir <donmoir at comcast.net>
>>>>>> wrote:
>>>>>>
>>>>>>  For now just seeing what you think about this. This is about thread
>>>>>> based
>>>>>>
>>>>>>> apps where this makes sense.
>>>>>>>
>>>>>>> When attempting to do a new seek or waiting to close a video, I find
>>>>>>> that
>>>>>>> I am waiting on avcodec_decode_video2 to return before I can
>>>>>>> continue.
>>>>>>> Depending on machine and video, this wait time can be up to about
>>>>>>> 50ms
>>>>>>> but
>>>>>>> normally wait time about 5 to 20 ms or so.
>>>>>>>
>>>>>>> You might say 'so what' and I would agree for a simple player app it
>>>>>>> does
>>>>>>> not make that much difference.
>>>>>>>
>>>>>>> If you are trying to stay on a timeline, or in case of scrubbing, or
>>>>>>> for
>>>>>>> editing apps, this wait time does make a difference. That is, you
>>>>>>> can not
>>>>>>> move on until avcodec_decode_video2 has returned.
>>>>>>>
>>>>>>> I can pretty much seek instantly to zero for any file except when I
>>>>>>> have
>>>>>>> to wait on avcodec_decode_video2 if that be the case.
>>>>>>>
>>>>>>> For me, it's normal for any intense process like decoding to be
>>>>>>> interruptible but this is not the case for AVCodecContext in ffmpeg.
>>>>>>> This
>>>>>>> is strange, don't you think?
>>>>>>>
>>>>>>> For AVFormatContext you have:
>>>>>>>
>>>>>>> typedef struct AVIOInterruptCB {
>>>>>>>     int (*callback)(void*);
>>>>>>>     void *opaque;
>>>>>>> } AVIOInterruptCB;
>>>>>>>
>>>>>>> I would use this model for AVCodecContext but change naming to:
>>>>>>>
>>>>>>> typedef struct AVInterruptCB {
>>>>>>>     int (*callback)(void*);
>>>>>>>     void *opaque;
>>>>>>> } AVInterruptCB;
>>>>>>>
>>>>>>> Then make name changes to whereever and add to AVCodecContext.
>>>>>>>
>>>>>>> This callback could be implemented piecemeal whereever needed over
>>>>>>> time,
>>>>>>> hitting the more intense processes first.
>>>>>>>
>>>>>>>
>>>>>>>  Just open a (potentially pre-cached) new AVCodecContext, it'll be
>>>>>> even
>>>>>> faster than your solution.
>>>>>>
>>>>>> Ronald
>>>>>>
>>>>>>
>>>>> hmmm. That's a thought for seeking I suppose but does not apply to
>>>>> waiting
>>>>> to close. Why do I care about close time ? Because another video has
>>>>> come
>>>>> in to replace it or variations of it. This can happen rapidly.
>>>>>
>>>>
>>>>
>>>> This is where more evolved languages have the concept of garbage
>>>> collection. For your purpose, you simply have a queue where you push
>>>> "AVCodecContexts I don't need anymore" into, and while the application
>>>> is
>>>> in the idle loop, you pop it and destroy items left in it.
>>>>
>>>> Really, I understand your use case, but you don't want to add all kind
>>>> of
>>>> clever hacks in AVCodecContext to get this kind of stuff done. You're
>>>> not
>>>> using a shared I/O resource that may be protected by a cookie or worse
>>>> for
>>>> pay-per-view video, and you're not in any sort of kernel wait, so
>>>> there's
>>>> no reason to add these kind of hacks. It's a logical thought, but this
>>>> problem has been solved already and there's better, easier and faster
>>>> solutions out there that do not involve adding hacks in every single
>>>> FFmpeg
>>>> decoder to actually support this.
>>>>
>>>> Ronald
>>>>
>>>
>>> Yeah really did not like the notion of changing decoders and I don't
>>> like adding anything that might not be needed, but I will see what I can do
>>> with your suggestions. I never know what ffmpeg can tolerate. I had asked
>>> in libav-user but get the same old BS there when asking about things like
>>> this.
>>>
>>
>> Ok tried some test code allocating new context and that worked pretty
>> well.
>>
>> I had to do this to get consistent results:
>>
>> AVCodec *codec ... already have it
>>
>> AVCodecContext *new_context = avcodec_alloc_context3 (NULL);
>> avcodec_copy_context (new_context,old_context);
>> avcodec_open2 (new_context,codec,NULL);
>>
>> The following worked for at least one file but for failed for others like
>> Theora etc.
>>
>> AVCodecContext *new_context = avcodec_alloc_context3 (codec);
>> avcodec_open2 (new_context,codec,NULL);
>>
>> For Theora it failed in avcodec_open2 saying 'missing side data' or
>> similiar.
>>
>> Using a cached context the wait time is zero. Executing the 3 statements
>> above on slower machine is about 1 to 4 ms. It's also not like it always
>> stuck in avcodec_decode_video2 either. In this case I don't need a new
>> context and wait time is zero.
>>
>> Thanks Ronald.
>>
>
> Ronald says:
>
>  Just open a (potentially pre-cached) new AVCodecContext, it'll be even
>>>>>> faster than your solution.
>>>>>>
>>>>>
> and
>
>
>  Really, I understand your use case, but you don't want to add all kind of
>>>> clever hacks in AVCodecContext to get this kind of stuff done. You're
>>>> not
>>>> using a shared I/O resource that may be protected by a cookie or worse
>>>> for
>>>> pay-per-view video, and you're not in any sort of kernel wait, so
>>>> there's
>>>> no reason to add these kind of hacks. It's a logical thought, but this
>>>> problem has been solved already and there's better, easier and faster
>>>> solutions out there that do not involve adding hacks in every single
>>>> FFmpeg
>>>> decoder to actually support this.
>>>>
>>>
> Using a cached and open AVCodecContext does work but it's like trying to
> kill an ant with a sledgehammer. Using a cached and not opened context
> helps some but you still loose time when opening it. So best results are
> when using a pre-cached open context.
>
> This means you will have allocated resources and most likely opened
> threads in your cached context that are not doing anything.
>
> So you have to ask what the real hack is. Keeping an opened cached context
> or having an interrupt callback. An interrupt callback does not use any
> additional resources but then it has to be implemented for every decoder.
> An opened context works now, but uses addtional resources and mostly opened
> threads that are more or less dormant. Having an interruptible intense
> process is normal and not a hack and you should not have to beat it to
> death to get it to work.
>
> I restrict the context to have at most 2 threads. Yes I could limit it to
> no new threads, but I get better results with 2. So you have to be careful.
> If the thread_count is zero, which is the default, then it will choose the
> number of threads based on the cpu count. You will have this number of
> opened threads in a cached opened context. On one of my machines this would
> be 8 opened and unused threads for cached open context, but I set
> thread_count to 2, getting diminishing returns on greater number of threads.

I'll bite. Please do define expensive, as Reimar also said. Do you mean
"cpu usage"? Or "memory usage"? Or something else?

Let's start with cpu usage: the threads are dormant, so their cpu usage is
zero. I'm not sure if you've ever decoded 1080p video, but your cpu is
pretty much hogged doing actual work.

Let's talk about memory usage also. Typical contexts are a few kb or
something along those lines. There are no reference frames in a
not-yet-decoding context. Each reference frame (420, 8bit) is 3mb, and
there's going to be up to 16 of them for h264 =~ 50mb. So the context is
relatively zero, and the memory is hogged with allocated reference frames
for the context that is doing actual work.

Oh, and let's also talk about implementation status of this concept: all
done. And zero maintenance cost.

In other words, I don't see the issue. Am I missing something?

Ronald