[FFmpeg-user] Whisper in ffmpeg 8
MacFH - C E Macfarlane - News
news at macfh.co.uk
Fri Aug 15 00:30:19 EEST 2025
On 2025-08-14 22:23, Rob Hallam wrote:
>
> On Thu, 14 Aug 2025 at 22:15, Bernhard Döbler <programmer at bardware.de> wrote:
>>
>> yesterday, news made the round, that ffmpeg 8 is going to be released,
>> soon, and it will contain whisper, an AI software that can understand
>> spoken text and create subtitles.
>>
>> Their github page https://github.com/ggml-org/whisper.cpp says they
>> offer a handful of models.
>>
>> Model Disk Mem
>> tiny 75 MiB ~273 MB
>> base 142 MiB ~388 MB
>> small 466 MiB ~852 MB
>> medium 1.5 GiB ~2.1 GB
>> large 2.9 GiB ~3.9 GB
>
> There is a commit [1] adding Whisper support [2]. As the docs note you
> will need to provide a model.
>
>> How does this work? Will all of this be compiled into the ffmpeg binary?
>
> --enable-whisper config option is added (default: no) [3] so up to
> whoever compiles your binary and you provide the model.
>
> [1]: https://github.com/FFmpeg/FFmpeg/commit/13ce36fef98a3f4e6d8360c24d6b8434cbb8869b
> [2]: https://ffmpeg.org/ffmpeg-filters.html#whisper-1
> [3]: https://github.com/FFmpeg/FFmpeg/blob/47c6af7d299c96b2e65f5f10526e0f34e00b23c8/configure#L339
Enlarging the question somewhat, is there existing AI that could be used
to process existing recordings that contain both speech and music, and
highlight or extract the areas, say by creating cut points, that contain
music?
Does anyone here know if this is possible?
More information about the ffmpeg-user
mailing list