[FFmpeg-devel] [PATCH] libavformat: add RCWT closed caption muxer

Sat Jan 6 14:33:59 EET 2024

On date Friday 2024-01-05 20:14:58 -0600, Marth64 wrote:
> Signed-off-by: Marth64 <marth64 at proxyid.net>
> 
> Raw Captions With Time (RCWT) is a format native to ccextractor, a commonly
> used open source tool for processing 608/708 closed caption (CC) sources.
> It can be used to archive the original, raw CC bitstream and to produce
> a source file file for later CC processing or conversion. As a result,

file file

> it also allows for interopability with ccextractor for processing CC data
> extracted via ffmpeg. The format is simple to parse and can be used
> to retain all lines and variants of CC.
> 
> A free specification of RCWT can be found here:
> https://github.com/CCExtractor/ccextractor/blob/master/docs/BINARY_FILE_FORMAT.TXT

> This muxer implements the specification as of 01/05/2024, which has

nit: use 2023-01-05 or EU format (2024/01/05) to avoid ambiguity

> been stable and unchanged for 10 years as of this writing.
> 
> This muxer will have some nuances from the way that ccextractor muxes RCWT.
> No compatibility issues when processing the output with ccextractor
> have been observed as a result of this so far, but mileage may vary
> and outputs will not be a bit-exact match.
> 
> Specifically, the differences are:
> (1)  This muxer will identify as "FF" as the writing program identifier, so
> as to be honest about the output's origin.
> 
> (2)  ffmpeg's MPEG-1/2, H264, HEVC, etc. decoders extract closed captioning
> data differently than ccextractor from embedded SEI/user data.
> For example, DVD captioning bytes will be translated to ATSC A53 format.
> This allows ffmpeg to handle 608/708 in a consistant way downstream.
> This is a lossless conversion and the meaningful data is retained.
> 
> (3)  This muxer will not alter the extracted data except to remove invalid
> packets in between valid CC blocks. On the other hand, ccextractor
> will by default remove mid-stream padding, and add padding at the end
> of the stream (in order to convey the end time of the source video).

This is a nice highlight and should be probably partially moved to
muxers.texi to expose this information to users (although many/most
are not documented, we should start to do so).

> ---
>  libavformat/Makefile     |   1 +
>  libavformat/allformats.c |   1 +
>  libavformat/rcwtenc.c    | 203 +++++++++++++++++++++++++++++++++++++++
>  tests/fate/subtitles.mak |   3 +
>  tests/ref/fate/sub-rcwt  |   1 +

missing Changelog entry, and I don't remember if new elements addition
entails a minor library bump (probably it should)

>  5 files changed, 209 insertions(+)
>  create mode 100644 libavformat/rcwtenc.c
>  create mode 100644 tests/ref/fate/sub-rcwt
> 
> diff --git a/libavformat/Makefile b/libavformat/Makefile
> index 45dba53044..03c2c70e67 100644
> --- a/libavformat/Makefile
> +++ b/libavformat/Makefile
> @@ -489,6 +489,7 @@ OBJS-$(CONFIG_QOA_DEMUXER)               += qoadec.o
>  OBJS-$(CONFIG_R3D_DEMUXER)               += r3d.o
>  OBJS-$(CONFIG_RAWVIDEO_DEMUXER)          += rawvideodec.o
>  OBJS-$(CONFIG_RAWVIDEO_MUXER)            += rawenc.o
> +OBJS-$(CONFIG_RCWT_MUXER)                += rcwtenc.o subtitles.o
>  OBJS-$(CONFIG_REALTEXT_DEMUXER)          += realtextdec.o subtitles.o
>  OBJS-$(CONFIG_REDSPARK_DEMUXER)          += redspark.o
>  OBJS-$(CONFIG_RKA_DEMUXER)               += rka.o apetag.o img2.o
> diff --git a/libavformat/allformats.c b/libavformat/allformats.c
> index dc2acf575c..fb14f15739 100644
> --- a/libavformat/allformats.c
> +++ b/libavformat/allformats.c
> @@ -388,6 +388,7 @@ extern const AVInputFormat  ff_qoa_demuxer;
>  extern const AVInputFormat  ff_r3d_demuxer;
>  extern const AVInputFormat  ff_rawvideo_demuxer;
>  extern const FFOutputFormat ff_rawvideo_muxer;
> +extern const FFOutputFormat ff_rcwt_muxer;
>  extern const AVInputFormat  ff_realtext_demuxer;
>  extern const AVInputFormat  ff_redspark_demuxer;
>  extern const AVInputFormat  ff_rka_demuxer;
> diff --git a/libavformat/rcwtenc.c b/libavformat/rcwtenc.c
> new file mode 100644
> index 0000000000..f70a80b175
> --- /dev/null
> +++ b/libavformat/rcwtenc.c
> @@ -0,0 +1,203 @@
> +/*
> + * Raw Captions With Time (RCWT) muxer
> + * Author: Marth64 <marth64 at proxyid.net>
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> + */
> +

> +/*
> + * Raw Captions With Time (RCWT) is a format native to ccextractor, a commonly
> + * used open source tool for processing 608/708 closed caption (CC) sources.
> + * It can be used to archive the original, raw CC bitstream and to produce
> + * a source file file for later CC processing or conversion. As a result,
> + * it also allows for interopability with ccextractor for processing CC data
> + * extracted via ffmpeg. The format is simple to parse and can be used
> + * to retain all lines and variants of CC.
> + *
> + * A free specification of RCWT can be found here:
> + * https://github.com/CCExtractor/ccextractor/blob/master/docs/BINARY_FILE_FORMAT.TXT
> + * This muxer implements the specification as of 01/05/2024, which has
> + * been stable and unchanged for 10 years as of this writing.
> + *
> + * This muxer will have some nuances from the way that ccextractor muxes RCWT.
> + * No compatibility issues when processing the output with ccextractor
> + * have been observed as a result of this so far, but mileage may vary
> + * and outputs will not be a bit-exact match.
> + *
> + * Specifically, the differences are:
> + * (1)  This muxer will identify as "FF" as the writing program identifier, so
> + *      as to be honest about the output's origin.
> + * (2)  ffmpeg's MPEG-1/2, H264, HEVC, etc. decoders extract closed captioning
> + *      data differently than ccextractor from embedded SEI/user data.
> + *      For example, DVD captioning bytes will be translated to ATSC A53 format.
> + *      This allows ffmpeg to handle 608/708 in a consistant way downstream.
> + *      This is a lossless conversion and the meaningful data is retained.
> + * (3)  This muxer will not alter the extracted data except to remove invalid
> + *      packets in between valid CC blocks. On the other hand, ccextractor
> + *      will by default remove mid-stream padding, and add padding at the end
> + *      of the stream (in order to convey the end time of the source video).
> + */

ditto

> +
> +#include "avformat.h"
> +#include "internal.h"
> +#include "mux.h"
> +#include "libavutil/log.h"
> +#include "libavutil/intreadwrite.h"
> +
> +#define RCWT_CLUSTER_MAX_BLOCKS             65535

> +#define RCWT_BLOCK_SIZE                     3 * sizeof(uint8_t)

or just use 3

> +
> +typedef struct RCWTContext {
> +    int cluster_nb_blocks;
> +    int cluster_pos;
> +    int64_t cluster_pts;
> +    uint8_t cluster_buf[RCWT_CLUSTER_MAX_BLOCKS * RCWT_BLOCK_SIZE];
> +} RCWTContext;
> +
> +static void rcwt_init_cluster(AVFormatContext *avf)
> +{
> +    RCWTContext *rcwt = avf->priv_data;
> +
> +    rcwt->cluster_nb_blocks = 0;
> +    rcwt->cluster_pos = 0;
> +    rcwt->cluster_pts = AV_NOPTS_VALUE;
> +    memset(rcwt->cluster_buf, 0, sizeof(rcwt->cluster_buf));
> +}
> +
> +static void rcwt_flush_cluster(AVFormatContext *avf)
> +{
> +    RCWTContext *rcwt = avf->priv_data;
> +
> +    if (rcwt->cluster_nb_blocks > 0) {
> +        avio_wl64(avf->pb, rcwt->cluster_pts);
> +        avio_wl16(avf->pb, rcwt->cluster_nb_blocks);

> +        avio_write(avf->pb, rcwt->cluster_buf,
> +                (rcwt->cluster_nb_blocks * RCWT_BLOCK_SIZE));

nit: weird indent

> +    }
> +
> +    rcwt_init_cluster(avf);
> +}
> +
> +static int rcwt_write_header(AVFormatContext *avf)
> +{

> +    if (avf->nb_streams != 1
> +            || avf->streams[0]->codecpar->codec_type != AVMEDIA_TYPE_SUBTITLE
> +            || avf->streams[0]->codecpar->codec_id != AV_CODEC_ID_EIA_608) {

nit+: weird indent

> +        av_log(avf, AV_LOG_ERROR,

> +               "RCWT supports only one CC (608/708) stream\n");

this could be more explicit:
"RCWT supports only one CC (608/708) stream, more than one stream was
provided or its codec type was not CC (608/708)\n");

> +        return AVERROR(EINVAL);
> +    }
> +
> +    avpriv_set_pts_info(avf->streams[0], 64, 1, 1000);
> +
> +    /* magic number */
> +    avio_wb16(avf->pb, 0xCCCC);
> +    avio_w8(avf->pb, 0xED);
> +
> +    /* program version (identify as ffmpeg) */
> +    avio_wb16(avf->pb, 0xFF00);
> +    avio_w8(avf->pb, 0x60);
> +
> +    /* format version, only version 0.001 supported for now */
> +    avio_wb16(avf->pb, 0x0001);
> +
> +    /* reserved */
> +    avio_wb16(avf->pb, 0x000);
> +    avio_w8(avf->pb, 0x00);
> +
> +    rcwt_init_cluster(avf);
> +
> +    return 0;
> +}
> +
> +static int rcwt_write_packet(AVFormatContext *avf, AVPacket *pkt)
> +{
> +    RCWTContext *rcwt = avf->priv_data;
> +
> +    int in_block = 0;
> +    int nb_block_bytes = 0;
> +
> +    if (pkt->size == 0)
> +        return 0;
> +
> +    /* new PTS, new cluster */
> +    if (pkt->pts != rcwt->cluster_pts) {
> +        rcwt_flush_cluster(avf);
> +        rcwt->cluster_pts = pkt->pts;
> +    }
> +
> +    if (pkt->pts == AV_NOPTS_VALUE) {
> +        av_log(avf, AV_LOG_WARNING, "Ignoring CC packet with no PTS\n");
> +        return 0;
> +    }
> +
> +    for (int i = 0; i < pkt->size; i++) {
> +        uint8_t cc_valid;
> +        uint8_t cc_type;
> +
> +        if (rcwt->cluster_nb_blocks == RCWT_CLUSTER_MAX_BLOCKS) {
> +            av_log(avf, AV_LOG_WARNING,
> +                    "Starting new cluster due to size\n");
> +            rcwt_flush_cluster(avf);
> +        }
> +

> +        cc_valid = (pkt->data[i] & 0x04) >> 2;

nit: no need to shift

> +        cc_type = pkt->data[i] & 0x03;
> +
> +        if (!in_block && !(cc_valid || cc_type == 3))
> +            continue;
> +

> +        memcpy(&rcwt->cluster_buf[rcwt->cluster_pos],
> +                &pkt->data[i], sizeof(uint8_t));

indent

> +        rcwt->cluster_pos++;
> +
> +        if (!in_block) {
> +            in_block = 1;
> +            nb_block_bytes = 1;
> +            continue;
> +        }
> +
> +        nb_block_bytes++;
> +
> +        if (nb_block_bytes == RCWT_BLOCK_SIZE) {
> +            in_block = 0;
> +            nb_block_bytes = 0;
> +            rcwt->cluster_nb_blocks++;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static int rcwt_write_trailer(AVFormatContext *avf)
> +{
> +    rcwt_flush_cluster(avf);
> +
> +    return 0;
> +}
> +
> +const FFOutputFormat ff_rcwt_muxer = {
> +    .p.name             = "rcwt",
> +    .p.long_name        = NULL_IF_CONFIG_SMALL("Raw Captions With Time"),
> +    .p.extensions       = "bin",
> +    .p.flags            = AVFMT_GLOBALHEADER | AVFMT_VARIABLE_FPS | AVFMT_TS_NONSTRICT,
> +    .p.subtitle_codec   = AV_CODEC_ID_EIA_608,
> +    .priv_data_size     = sizeof(RCWTContext),
> +    .write_header       = rcwt_write_header,
> +    .write_packet       = rcwt_write_packet,
> +    .write_trailer      = rcwt_write_trailer
> +};

No more comments from me, thanks.