[FFmpeg-devel] [PATCH v3 1/6] avcodec/mpeg12dec: extract only one type of CC substream

Andreas Rheinhardt andreas.rheinhardt at outlook.com
Tue Mar 12 13:52:29 EET 2024


Stefano Sabatini:
> On date Tuesday 2024-03-12 01:00:00 -0500, Marth64 wrote:
>> In MPEG-2 user data, there can be different types of Closed Captions
>> formats embedded (A53, SCTE-20, or DVD). The current behavior of the
>> CC extraction code in the MPEG-2 decoder is to not be aware of
>> multiple formats if multiple exist, therefore allowing one format
>> to overwrite the other during the extraction process since the CC
>> extraction shares one output buffer for the normalized bytes.
>>
>> This causes sources that have two CC formats to produce flawed output.
>> There exist real-world samples which contain both A53 and SCTE-20 captions
>> in the same MPEG-2 stream, and that manifest this problem. Example of symptom:
>> THANK YOU (expected) --> THTHANANK K YOYOUU (actual)
>>
>> The solution is to pick only the first CC substream observed with valid bytes,
>> and ignore the other types. Additionally, provide an option for users
>> to manually "force" a type in the event that this matters for a particular
>> source.
>>
>> Signed-off-by: Marth64 <marth64 at proxyid.net>
>> ---
>>  libavcodec/mpeg12dec.c | 67 ++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 64 insertions(+), 3 deletions(-)
>>
>> diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
>> index 3a2f17e508..8961a290a3 100644
>> --- a/libavcodec/mpeg12dec.c
>> +++ b/libavcodec/mpeg12dec.c
>> @@ -62,6 +62,16 @@
>>  
>>  #define A53_MAX_CC_COUNT 2000
>>  
>> +enum Mpeg2ClosedCaptionsFormat {
>> +    CC_FORMAT_AUTO,
>> +    CC_FORMAT_A53_PART4,
>> +    CC_FORMAT_SCTE20,
>> +    CC_FORMAT_DVD
>> +};
> 
>> +static const char mpeg2_cc_format_labels[4][12] = {
> 
> nit: this might be 
> static const char *mpeg2_cc_format_labels[4] = {
> 

This would add relocations and put this into .data.rel.ro.

> to avoid unnecessary constraints on the string length, or you might
> pass the CC name in the function directly to avoid to maintain the
> array (as it is not shared at the moment) 
> 

That sound like a good idea.

>> +    "Unknown", "A/53 Part 4", "SCTE-20", "DVD"
>> +};
>> +
> 
> [...]
> 
> LGTM otherwise.



More information about the ffmpeg-devel mailing list