[FFmpeg-devel] [PATCH] avformat/webvttdec: improve WebVTT parsing
Tomas Härdin
git at haerdin.se
Wed Jun 18 10:01:22 EEST 2025
fre 2025-06-13 klockan 13:03 +0000 skrev Marcos Del Sol:
> Tomas Härdin:
> > tis 2025-06-10 klockan 11:42 +0000 skrev Marcos Del Sol:
> > > WebVTT is supposed to be an extensible format.
> >
> > The syntax says otherwise. Why the W3C feels the need to specify a
> > particular imperative algorithm for parsing I cannot know, but this
> > is
> > not how RFCs are authored. It also makes implementing WebVTT in
> > functional languages impossible. It is a shotgun parser to boot.
>
> What do you mean that's not how RFCs are authored? Go read RFC2083
> from 1997 where it has literal C code in it. You should consider
> writing
> an irate email to the IETF and tell them that has to go. This TLV-
> based
> standard, by the way, also asks you to ignore unknown tags.
The important difference is that KLV based formats allow us to
*recognize* unknown tags before attempting to process them. RFC2083
does not specify two mutually incompatible languages as far as I can
tell. The C code in for example section 10.8 specifies how to *process*
pixel data already recognized (parsed), assuming the file is sRGB. It
also appears to be wrong, but let's ignore that
This stuff is important because every CVE relates to parsing. Language
ambiguities can and have lead to CVEs. The parsing of URIs is one
example, for which curl caught flak since it does not adhere to the
regex specified in the URI RFC. lavf has similar URI issues I'm sure,
which is why I'm adamant that the codebase needs to be de-postelized.
If for example a PNG file has more than one IHDR chunk then it should
be rejected. We should not attempt to guess what should be done in this
case, but loudly abort
With WebVTT this may seem academic, until you realize that ambiguities
in the spec can be abused to make two different decoders display
different things. In places with strict legislation on certain kinds of
speech this can have legal consequences.
Anyway, I have said my peace and placated the langsec spirits. When the
time comes to hand out I-told-you-so's a few years down the line, I can
point to this and other posts in this vein
/Tomas
More information about the ffmpeg-devel
mailing list