[FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add derive-device function which searches for existing devices in both directions
Soft Works
softworkz at hotmail.com
Fri May 27 19:12:35 EEST 2022
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Mark Thompson
> Sent: Tuesday, May 3, 2022 10:23 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> derive-device function which searches for existing devices in both
> directions
>
> On 03/05/2022 01:11, Soft Works wrote:
> >> -----Original Message-----
> >> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of
> Mark
> >> Thompson
> >> Sent: Tuesday, May 3, 2022 1:57 AM
> >> To: ffmpeg-devel at ffmpeg.org
> >> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> derive-
> >> device function which searches for existing devices in both
> directions
> >>
> >> On 02/05/2022 23:59, Soft Works wrote:
> >>>> -----Original Message-----
> >>>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf
> Of
> >> Mark
> >>>> Thompson
> >>>> Sent: Tuesday, May 3, 2022 12:12 AM
> >>>> To: ffmpeg-devel at ffmpeg.org
> >>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> >> derive-
> >>>> device function which searches for existing devices in both
> >> directions
> >>>>
> >>>> On 02/05/2022 09:14, Soft Works wrote:
> >>>>>> -----Original Message-----
> >>>>>> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf
> Of
> >>>> Mark
> >>>>>> Thompson
> >>>>>> Sent: Monday, May 2, 2022 12:01 AM
> >>>>>> To: ffmpeg-devel at ffmpeg.org
> >>>>>> Subject: Re: [FFmpeg-devel] [PATCH 1/3] avutils/hwcontext: add
> >>>> derive-
> >>>>>> device function which searches for existing devices in both
> >>>> directions
> >>>>>
> >>>>> [..]
> >>>>>
> >>>>>>>> * The thread-safety properties of the hwcontext API have
> been
> >>>> lost
> >>>>>> -
> >>>>>>>> you can no longer operate on devices independently across
> >> threads
> >>>>>>>> (insofar as the underlying API allows that).
> >>>>>>>> Maybe there is an argument that derivation is
> something
> >>>> which
> >>>>>>>> should happen early on and therefore documenting it as
> thread-
> >>>>>> unsafe
> >>>>>>>> is ok, but when hwupload/hwmap can use it inside
> filtergraphs
> >>>> that
> >>>>>>>> just isn't going to happen (and will be violated in the
> FFmpeg
> >>>>>> utility
> >>>>>>>> if filters get threaded, as is being worked on).
> >>>>>>>
> >>>>>>> From my understanding there will be a single separate
> thread
> >>>> which
> >>>>>>> handles all filtergraph operations.
> >>>>>>> I don't think it would even be possible (without massive
> >> changes)
> >>>>>>> to arbitrate filter processing in parallel.
> >>>>>>> But even if this would be implemented: the filtergraph setup
> >>>> (init,
> >>>>>>> uninit, query_formats, etc.) would surely happen on a single
> >>>> thread.
> >>>>>>
> >>>>>> The ffmpeg utility creates filtergraphs dynamically when the
> >> first
> >>>>>> frame is available from their inputs, so I don't see why you
> >>>> wouldn't
> >>>>>> allow multiple of them to be created in parallel in that case.
> >>>>>>
> >>>>>> If you create all devices at the beginning and then give
> >> references
> >>>> to
> >>>>>> them to the various filters which need them (so never
> manipulate
> >>>>>> devices dynamically within the graph) then it would be ok, but
> I
> >>>> think
> >>>>>> you've already rejected that approach.
> >>>>>
> >>>>> Please let's not break out of the scope of this patchset.
> >>>>> This patchset is not about re-doing device derivation. The only
> >>>>> small change that it does is that it returns an existing device
> >>>>> instead of creating a new one when such device already exists
> >>>>> in the same(!) chain.
> >>>>>
> >>>>> The change it makes has really nothing to do with threading or
> >>>>> anything around it.
> >>>>
> >>>> The change makes existing thread-safe hwcontext APIs unsafe.
> That
> >> is
> >>>> definitely not "nothing to do with threading", and has to be
> >> resolved
> >>>> since they can currently be called from contexts which expect
> >> thread-
> >>>> safety (such as making filtergraphs).
> >>>
> >>> As I said, I'm not aware that filtergraph creation would be a
> >> context
> >>> which requires thread-safety, even when considering frames which
> get
> >>> pushed into a filtergraph (like from a decoder).
> >>
> >> User programs can do whatever they like within the API
> constraints.
> >>
> >>> Could you please show a command line which causes a situation
> where
> >>> device_derive is being called from multiple threads?
> >>
> >> Given that the ffmpeg utility is currently entirely single-
> threaded,
> >> this will only happen once the intended plans for adding threading
> to
> >> it are complete.
> >
> > As mentioned, I don't think that would be possible easily, and from
> > what I have understood, the plan is to have separate threads for
> decoding,
> > encoding and filtering but not multiple threads for filtering.
>
> As the summary in <https://lists.ffmpeg.org/pipermail/ffmpeg-
> devel/2022-April/294940.html> states, the intent is a separate thread
> for each instance of those things, including filtergraphs.
>
> >> With that, it will be true for something which makes two
> filtergraphs
> >> and uses derivation in both of them. For example:, this currently
> >> makes two independent VAAPI devices, but equally could reuse the
> same
> >> device:
> >>
> >> # ffmpeg -y -f kmsgrab -i /dev/dri/card0 -vf
> >> hwmap=derive_device=vaapi,scale_vaapi=format=nv12 -c:v h264_vaapi
> >> out1.mp4 -vf
> >> hwmap=derive_device=vaapi,scale_vaapi=w=1280:h=720:format=nv12 -
> c:v
> >> h264_vaapi out2.mp4
> >
> > Well, multi-threading is not an issue right now, and I don't expect
> it
> > to be then.
> >
> > But on a more practical take: what do you want me to do?
> >
> > Guard that function with a lock? I can do this, no problem.
Hi Mark,
I have submitted a new patchset based on your suggestions:
https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=6655
> I'd like it to be designed in a way that avoids this sort of problem,
> as all the existing functions are.
Understood and accepted.
> > none of any of the device control function does any
> synchronization,
> > so why would exactly this - and only this function need
> synchronization?
>
> The derived_devices field is modified post-creation, which gives you
> data races if there is no other synchronisation.
>
> (Consider simultaneous derivation calls: currently that's completely
> fine - it makes no changes to the parent device and the derived
> devices are independent. With your proposed patch there is undefined
> behaviour because of the simultaneous reading and writing of the
> derived_devices field.)
>
> The only other mutable field is the buffer pool in a frames context,
> which has its own internal synchronisation.
I had thought there must be other cases of race conditions because
none of the other functions are employing any synchronization patterns,
but - at least with regards to device ctx - I realized that you are right
in that there aren't any races (in "regular" use cases at least).
> Having thought about this a bit further, I suspect you are going to
> need an additional shared-devices object of some kind in order to get
> around the circular references and fix the memory leak problem.
>
> That additional object will have to be mutable and accessible from
> multiple threads, so will probably need an internal lock (or careful
> lock-free design).
I think I have found a safe and simple solution to both requirements:
weak-references and synchronization.
Weak References
===============
The use of AVBuffer and AVBufferRef is a well-established pattern, so it's
surely not an option to make any fundamental changes or start using a
different pattern for a specific use case. Unfortunately, its implementation
doesn't leave room for adapting it as a weak-reference pattern.
So came to the following pattern:
- When a device context is created it is registered in a global array
(with thread-safe access) and gets a registration id (1, 2, 3, ...)
=> see register_hw_device()
- The registration id is the "weak reference"
- When a device context is freed, it gets unregistered
(removed from the array)
=> see unregister_hw_device()
- Now, a device context can reference other device contexts just
through that number (of course not knowing whether the corresponding
device is still "alive")
- But this can be be determined by querying from the array which can return an
AVBufferRef to that device on success´
=> see get_registered_hw_device()
Synchronization
===============
- All register and unregister calls are locked by a mutex
- Querying for a device by registration id is locked in the same way
and returns an AVBufferRef to the device context, making sure
that the returned device context will remain valid and not be
freed
Specifics
=========
I had one problem to make it work as described above: How to return
an AVBufferRef to the device context from get_registered_hw_device()?
Currently, an AVBufferRef can only be created from another AVBufferRef.
But neither the device context nor the registration array can store
an AVBufferRef to the context, because that additional reference would
cause the device context to never be freed at all.
Neither could we use the AVBufferRef that is supplied for
av_hwdevice_ctx_init(), because it could be released at some time while
another AVBufferRef is still holding a reference.
That's why I chose to use and store AVBuffer itself. AVBuffer itself
is not suitable for use as a weak reference, because it's not possible
to determine from the pointer once and whether it is av_free'd or not.
Though, in the context of hwdevice we don't have that problem due
to the use of hwdevice_ctx_free() as a custom callback for freeing
the AVBuffer, so we can safely use AVBuffer to store as a reference
in the global device context registration array.
The reference count is stored in AVBuffer anyway, and that leaves us
with just one requirement:
Being able to create an AVBufferRef from an AVBuffer directly instead
of from another AVBufferRef.
This is done in PATCH 1/4: add av_ref_from_buffer() function
I hope this makes sense to you and fulfills the requirements you
are asking for.
Please let me know what you think.
Thanks,
sw
More information about the ffmpeg-devel
mailing list