[FFmpeg-devel] [PATCH] libavcodec: Do not return encoding errors when -sub_charenc_mode is do_nothing

Paul B Mahol onemda at gmail.com
Thu Aug 29 22:51:12 CEST 2013


On 8/29/13, Nicolas George <nicolas.george at normalesup.org> wrote:
> Le duodi 12 fructidor, an CCXXI, Eelco Lempsink a ecrit :
>> Thanks for your explanation.  Now I understand the underlying idea, I
>> would prefer that FFmpeg would exit with an error state, though, since
>> now
>> it's unclear that data is missing when using FFmpeg in a larger workflow
>> where warnings might get lost in the noise.
>
> I agree that ffmpeg (the command-line tool) should be stricter with this
> kind of error. You can use -xerror to tell it to be.
>
>> I'm also curious to hear how you plan to handle the encoding detection
>> (e.g. for an SRT file) or if you think that's the responsibility of the
>> user.
>
> My plan is mostly to imitate Vim's behaviour: let the user specify a list
> of
> encodings, try them each until one works, and recognize obvious signs such
> as byte order marks.

And why is that an optimal solution?

>
>> Hmm, you might be correct.  We're using FFmpeg for two things: extracting
>> embedded text-based subtitles as SRT and for normalizing SRTs.
>>
>> For the normalizing (basically using the FFmpeg SRT parser to filter
>> problems in the SRT) it would be possible to do the encoding detection on
>> the input rather than the output.  That way we can ensure UTF8 goes in
>> and
>> comes out, so that should be no problem.
>
> Yes, I would advise that.
>
>> As far as the extracting goes, I suppose the encoding information is
>> either embedded in the format or defined in the format's specification.
>
> I do not know a format that does not specify the encoding. Multimedia
> formats capable of holding text subtitles are rather recent, they were
> designed at a time when people understand Unicode is the only sane way to
> go.
>
>> I'm not entirely sure that all formats and tools can be trusted though.
>
> It is probably a dangerous assumption indeed, but I believe you should not
> try to spend time on how to handle the situation until it actually occurs
> for you, just be sure you can detect it.
>
> That makes me realize: disabling the check would allow ffmpeg to produce
> just that kind of invalid files: S_TEXT in Matroska is specified as UTF-8,
> while ffmpeg would just copy the encoding of the input file. It is IMHO a
> very good reason not to disable it.
>
> Regards,
>
> --
>   Nicolas George
>


More information about the ffmpeg-devel mailing list