[FFmpeg-devel] [PATCH] slicethread: Limit the automatic number of threads to 16

Lukas Fellechner Lukas.Fellechner at gmx.net
Sun Sep 11 22:00:14 EEST 2022


>> 2. Spawning too many threads when "auto" is used in multiple places
>>
>> This can indeed be an efficiency problem, although probably not major.
>> Since usually only one part of the pipeline is active at any time,
>> many of the threads will be sleeping, consuming very little resources.
>
> For 32 bit processes running out of address space, yes, the issue is with
> "auto" being used in many places at once.
>
> But in general, allowing arbitrarily high numbers of auto threads isn't
> beneficial - the optimal cap of threads depends a lot on the content at
> hand.
>
> The system I'm testing on has 160 cores - and it's quite certain that
> doing slice threading with 160 slices doesn't make sense. Maybe the cap of
> 16 is indeed too low - I don't mind raising it to 32 or something like
> that. Ideally, the auto mechanism would factor in the resolution of the
> content.
>
> Just for arguments sake - here's the output from 'time ffmpeg ...' for a
> fairly straightforward transcode (decode, transpose, scale, encode), 1080p
> input 10bit, 720p output 8bit, with explicitly setting the number of
> threads ("ffmpeg -threads N -i input -threads N -filter_threads N
> output").
>
> 12:
> real 0m25.079s
> user 5m22.318s
> sys 0m5.047s
>
> 16:
> real 0m19.967s
> user 6m3.607s
> sys 0m9.112s
>
> 20:
> real 0m20.853s
> user 6m21.841s
> sys 0m28.829s
>
> 24:
> real 0m20.642s
> user 6m28.022s
> sys 1m1.262s
>
> 32:
> real 0m29.785s
> user 6m8.442s
> sys 4m45.290s
>
> 64:
> real 1m0.808s
> user 6m31.065s
> sys 40m44.598s
>
> I'm not testing this with 160 threads for each stage, since 64 already was
> painfully slow - while you suggest that using threads==cores always should
> be preferred, regardless of the number of cores. The optimum here seems to
> be somewhere between 16 and 20.

These are interesting scores. I would not have expected such a dramatic
effect of having too many threads. You are probably right that always using
the core count as auto threads is not such a good idea.

But the encoding part works on 720p, so there each of the 64 threads only
has 11 lines and 14.000 pixels to process, which is really not much.
I do not have a CPU with so many cores, but when doing 4K -> 4K transcode,
I sure see a benefit of using 32 vs 16 cores.

Maybe the best approach would really be to decide auto thread count
on the amount of pixels to process (I would not use line count because
when line count doubles, the pixel count usually goes up by factor 4).
This would probably need some more test data. I will also try to do some
testing on my side.

- Lukas


More information about the ffmpeg-devel mailing list