[FFmpeg-devel] [RFC] Introducing policies regarding "AI" contributions

Fri Jul 4 02:14:11 EEST 2025

On 2025-07-03 02:16 +0200, Gerion Entrup wrote:
> Am Dienstag, 1. Juli 2025, 12:58:23 Mitteleuropäische Sommerzeit schrieb Alexander Strasser via ffmpeg-devel:
[...]
> > Thus I want this thread to start a discussion, that eventually leads
> > to a policy about submitting and integrating "AI" generated content.
> > 
> > Leaving all ethical issues aside for a moment I still see 2 very big
> > problems with AI generated code:
> > 
> > * looks generally plausible but is often subtly wrong
> >     * leading to more work, regressions and costs
> >         * which often lands on a different group of people (other
> >           projects, reviewers, bug finders, bug fixers, etc.)
> >         * which are sometimes delayed for quite some time increasing
> >           the costs of fixing them
> > * license/copyright violations
> >     * this might be sometimes a non-issue with small changes
> >     * but especially for complete components the risk seems high
> > 
> > There is a lot more to the topic and I probably forgot to bring up
> > many more important aspects and details. Please feel free to bring
> > more things up in the discussion!
> > 
> > There was a preparation in the musl project to put up a policy[2],
> > it has not yet been finalized and realized as far as I understand.
> 
> Just to link it here. Remembers me on the Gentoo Linux discussion:
> https://archives.gentoo.org/gentoo-dev/9007c921a8a57655ecb2027eb4be4bff02673af4.camel@zougloub.eu/T/#t
> https://wiki.gentoo.org/wiki/Project:Council/AI_policy

Thanks for the links to the Gentoo discussion and policy!

IMHO the discussion and the resulting policy is interesting and maybe
something similar would be appropriate for FFmpeg.

I also became aware of LLVM policy:

  https://llvm.org/docs/DeveloperPolicy.html#ai-generated-contributions

But I must say I do not like it as much. To cite the most critical part:

    As such, the LLVM policy is that contributors are permitted to use
    artificial intelligence tools to produce contributions, provided that
    they have the right to license that code under the project license.
    Contributions found to violate this policy will be removed just like
    any other offending contribution.

For "AI" (in the LLM sense) I think it's usually not at all easy to
say if one has the right to license the code given it's trained on
a huge corpus of copyrighted and particularly licensed code.

Anyway they agree on license/copyright concern I raised. As does Gentoo.

And the LLVM policy also comes to a similar conclusions, as does Gentoo,
regarding waste of project resources:

    We encourage contributors to review all generated code before sending
    it for review to verify its correctness and to understand it so that
    they can answer questions during code review. Reviewing and maintaining
    generated code that the original contributor does not understand is not
    a good use of limited project resources.

If anyone has more examples at hand, it would probably be interesting to
know and take a look.

Best regards,
  Alexander

> > It also brings up the point, that it is not really related to
> > recent "AI" tech, but more to the origin of work and its handling.
> > Unfortunately "AI" made problems with this a lot more common.
> > 
> > 
> > Best regards,
> >   Alexander
> > 
> > 1. https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2025-April/342146.html
> > 2. https://www.openwall.com/lists/musl/2024/10/19/3