[FFmpeg-devel] [PATCH 6/6] ffv1enc_vulkan: switch to receive_packet
Jerome Martinez
jerome at mediaarea.net
Sun Nov 24 17:51:55 EET 2024
Le 24/11/2024 à 04:41, Lynne via ffmpeg-devel a écrit :
> On 11/23/24 23:10, Jerome Martinez wrote:
>
>> Le 23/11/2024 à 20:58, Lynne via ffmpeg-devel a écrit :
>>> This allows the encoder to fully saturate all queues the GPU
>>> has, giving a good 10% in certain cases and resolutions.
>>
>>
>> Using a RTX 4070:
>> +50% (!!!) with 2K 10-bit content.
>> +17% with 4K 16-bit content.
>> Also the speed with 2K content is now 4x the speed of 4K content
>> which is similar to the SW encoder (with similar count of slices) and
>> which is the expected result, it seems that a bottleneck with smaller
>> resolutions is removed.
>>
>>
>> Unfortunatly, it has a drawback, a 6K5K content which was well
>> handled without this patch is now having an immediate error:
>> [vost#0:0/ffv1_vulkan @ 0x10467840] [enc:ffv1_vulkan @ 0x12c011c0]
>> Error submitting video frame to the encoder
>> [vost#0:0/ffv1_vulkan @ 0x10467840] [enc:ffv1_vulkan @ 0x12c011c0]
>> Error encoding a frame: Cannot allocate memory
>> [vost#0:0/ffv1_vulkan @ 0x10467840] Task finished with error code:
>> -12 (Cannot allocate memory)
>> [vost#0:0/ffv1_vulkan @ 0x10467840] Terminating thread with return
>> code -12 (Cannot allocate memory)
>>
>> Which is a problem, the handling of 6K5K being good on the RTX 4070
>> (3x faster than a CPU at the same price) before this patch.
>> Is it possible to keep the handling of bigger resolutions on such
>> card while keeping the performance boost of this patch?
>
>
> To an extent. At high resolutions, -async_depth 0 (maximum) harms
> performance for higher
> resolution. I get the best results with it set to 2 or 3 for 6k
> content, on my odd setup.
> Increasing async_depth increases the amount of VRAM used, so that's
> the tradeoff.
> Automatically detecting it is difficult, as Vulkan doesn't give you
> metrics on how much free
> VRAM there is, so there's nothing we can do
I am torn between a default having as much performance as possible and a
default working for sure (a default value of 1 is OK for the 6K5K
content on the RTX 4070, not 2).
Surprisingly, default async_depth works on 4K (51 MiB) but async_depth 2
does not work on 6K5K (183 MiB), but I don't know what is the value of
nb_queues.
Maybe real use case is a user managing 6K5K with the biggest GPU
available so it does not hurt much to have a default crashing with such
big content.
The encoder catches the allocation error and sends a nice message,
wouldn't it possible to reduce automatically async_depth and retry
instead of sending immediately the error, in the case async_depth is not
provided, and error only if -async_depth 1 does not work?
> than to document it and hope users follow the instructions in case
> they run out of memory.
If not possible to try automatically smaller values, is it possible to
add "use -async_depth with a value smaller than (here the current
value)" to the error message?
> The good news is that -async_depth 1 uses less VRAM than before this
> patch.
> Must of the VRAM used is from somewhere within Nvidia's black-box
> driver, as RADV
> uses 1/3rd of the VRAM at the same content and async_depth settings.
> Nothing we
> can do about this too.
>
>
>>> This also improves error resilience if an allocation fails,
>>> and properly cleans up after itself if it does.
>>
>> Looks like that this part does not work, still a freeze if an
>> allocation fails.
>
>
> This is due to Nvidia's drivers. If you switch to using their GSP
> firmware, recovery is instant, pretty much.
Beyond my knowledge, and it does not make things worse so not blocking.
More information about the ffmpeg-devel
mailing list