[FFmpeg-user] Whisper in ffmpeg 8

Fri Aug 15 00:30:19 EEST 2025

On 2025-08-14 22:23, Rob Hallam wrote:
>
> On Thu, 14 Aug 2025 at 22:15, Bernhard Döbler <programmer at bardware.de> wrote:
>>
>> yesterday, news made the round, that ffmpeg 8 is going to be released,
>> soon, and it will contain whisper, an AI software that can understand
>> spoken text and create subtitles.
>>
>> Their github page https://github.com/ggml-org/whisper.cpp says they
>> offer a handful of models.
>>
>> Model   Disk    Mem
>> tiny    75 MiB  ~273 MB
>> base    142 MiB         ~388 MB
>> small   466 MiB         ~852 MB
>> medium  1.5 GiB         ~2.1 GB
>> large   2.9 GiB         ~3.9 GB
> 
> There is a commit [1] adding Whisper support [2]. As the docs note you
> will need to provide a model.
> 
>> How does this work? Will all of this be compiled into the ffmpeg binary?
> 
> --enable-whisper config option is added (default: no) [3] so up to
> whoever compiles your binary and you provide the model.
> 
> [1]: https://github.com/FFmpeg/FFmpeg/commit/13ce36fef98a3f4e6d8360c24d6b8434cbb8869b
> [2]: https://ffmpeg.org/ffmpeg-filters.html#whisper-1
> [3]: https://github.com/FFmpeg/FFmpeg/blob/47c6af7d299c96b2e65f5f10526e0f34e00b23c8/configure#L339

Enlarging the question somewhat, is there existing AI that could be used 
to process existing recordings that contain both speech and music, and 
highlight or extract the areas, say by creating cut points, that contain 
music?

Does anyone here know if this is possible?