[FFmpeg-devel] [RFC] Introducing policies regarding "AI" contributions

Fri Jul 4 02:31:19 EEST 2025

On 2025-07-01 14:44 +0200, Kacper Michajlow wrote:
> On Tue, 1 Jul 2025 at 12:58, Alexander Strasser via ffmpeg-devel
[...]
> >
> > I do not like the branding of the LLMs as AI, thus I will for now
> > continue to call it "AI" in quotes. I'm open for better terms.
> >
> > It was just yesterday brought up on IRC in #ffmpeg-devel that there
> > was at least one, marked attempt to include "AI" generated code[1].
> >
> > At least I would say that this particular patch series was rejected,
> > but there were was no explicit discussion and clear statement about
> > "AI" generated content; especially code.
> >
> > Thus I want this thread to start a discussion, that eventually leads
> > to a policy about submitting and integrating "AI" generated content.
> 
> I don't think labeling code as "AI" matters that much. Let's ignore
> licensing/legal issues for now.

OK, but I really don't think we can ignore the legal consequences
for FFmpeg, as it is Open Source software, and we would put all
users of FFmpeg, individuals and companies, at risk.

> What's important is the code itself and its quality. It doesn't matter
> how it was created. Whether by a human, "AI" or something else. The
> key is the final product. "AI" is just a tool, and like any tool, it
> can be used well or poorly. How you use it may be completely different
> between "operators".
> 
> I think the "AI" label exists because the code that LLMs produce is
> often incomplete, low quality, and a pile of spaghetti that somehow
> works for a single use case. but is far from being a sane, production
> ready implementation. Anyone who has used these tools knows their
> limitations and what they can or cannot do.
> 
> That said, if "AI" code means low quality code, then by all means, it
> should be rejected. This applies to human, alien, or "AI" generated
> code. There shouldn't be a different metric for "AI" code. If "AI"
> (and its "operator") produces high quality code, there's no reason to
> reject it.
> 
> After all, how can you even detect "AI" code? If the code, regardless
> of who or what wrote it, follows project guidelines and is overall
> high quality, that's all that matters.

I kind of agree that good code is good code, but it's not enough.
Important is also having people around that truly understand the
good code.

To find out if it is truly good code someone needs to review it very
deeply, which is extra hard if it is "AI" generated code as it tends
to look very plausible; which could waste a lot of time for the people
looking at it and reviewing it. This also diminishes the actual value
of the use of "AI" in the first place.

Taking that for granted there is the open question for submissions
by maintainers (with git push access), who could submit "AI" generated
code and push it themselves after a considerable push warning.

> P.S. I don't like those "This code was fully made by an LLM"
> statements and the like. Who cares? Maybe some investor who's pushing
> this. But from a technical point of view, there's no difference. After
> all, you don't start your patchset by saying, "This code was written
> in Vim with <list of plugins> on Arch Linux, on an ergonomic split
> keyboard, with an XYZ monitor.".

[...]

Thanks for your feed back!

Greetings,
  Alexander