[FFmpeg-devel] [PATCH v22 22/23] avutil/ass_split: Add parsing of hard-space tags (\h)
Soft Works
softworkz at hotmail.com
Thu Dec 9 14:39:41 EET 2021
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Soft Works
> Sent: Thursday, December 9, 2021 1:13 PM
> To: ffmpeg-devel at ffmpeg.org
> Subject: [FFmpeg-devel] [PATCH v22 22/23] avutil/ass_split: Add parsing of
> hard-space tags (\h)
>
> The \h tag in ASS/SSA is indicating a non-breaking space. See
> https://github.com/Aegisub/aegisite/blob/master/source/docs/3.2/ASS_Tags.html
> .md
>
> The ass_split implementation is used by almost all text subtitle
> encoders and it didn't handle this tag. Interestingly, several tests
> are testing for \h parsing and had incorrect reference data for those tests.
>
> The \h tag is specific to ASS and doesn't have any meaning outside of ASS.
> Still, the reference data for ttmlenc, textenc and webvttenc were full of
> \h tags even though this tag doesn't have a meaning there.
>
> Signed-off-by: softworkz <softworkz at hotmail.com>
> ---
> libavutil/ass_split.c | 7 +++++++
> libavutil/ass_split_internal.h | 1 +
> tests/ref/fate/mov-mp4-ttml-dfxp | 8 ++++----
> tests/ref/fate/mov-mp4-ttml-stpp | 8 ++++----
> tests/ref/fate/sub-textenc | 10 +++++-----
> tests/ref/fate/sub-ttmlenc | 8 ++++----
> tests/ref/fate/sub-webvttenc | 10 +++++-----
> 7 files changed, 30 insertions(+), 22 deletions(-)
>
> diff --git a/libavutil/ass_split.c b/libavutil/ass_split.c
> index c5963351fc..30512dfc74 100644
> --- a/libavutil/ass_split.c
> +++ b/libavutil/ass_split.c
> @@ -484,6 +484,7 @@ int avpriv_ass_split_override_codes(const
> ASSCodesCallbacks *callbacks, void *pr
> while (buf && *buf) {
> if (text && callbacks->text &&
> (sscanf(buf, "\\%1[nN]", new_line) == 1 ||
> + sscanf(buf, "\\%1[hH]", new_line) == 1 ||
> !strncmp(buf, "{\\", 2))) {
> callbacks->text(priv, text, text_len);
> text = NULL;
> @@ -492,6 +493,12 @@ int avpriv_ass_split_override_codes(const
> ASSCodesCallbacks *callbacks, void *pr
> if (callbacks->new_line)
> callbacks->new_line(priv, new_line[0] == 'N');
> buf += 2;
> + } else if (sscanf(buf, "\\%1[hH]", new_line) == 1) {
> + if (callbacks->hard_space)
> + callbacks->hard_space(priv);
> + else if (callbacks->text)
> + callbacks->text(priv, " ", 1);
> + buf += 2;
> } else if (!strncmp(buf, "{\\", 2)) {
> buf++;
> while (*buf == '\\') {
> diff --git a/libavutil/ass_split_internal.h b/libavutil/ass_split_internal.h
> index 8e8e51115c..d6eaade4a4 100644
> --- a/libavutil/ass_split_internal.h
> +++ b/libavutil/ass_split_internal.h
> @@ -141,6 +141,7 @@ typedef struct {
> * @{
> */
> void (*text)(void *priv, const char *text, int len);
> + void (*hard_space)(void *priv);
> void (*new_line)(void *priv, int forced);
> void (*style)(void *priv, char style, int close);
> void (*color)(void *priv, unsigned int /* color */, unsigned int
> color_id);
> diff --git a/tests/ref/fate/mov-mp4-ttml-dfxp b/tests/ref/fate/mov-mp4-ttml-
> dfxp
> index e24b5d618b..e565ffa1f6 100644
> --- a/tests/ref/fate/mov-mp4-ttml-dfxp
> +++ b/tests/ref/fate/mov-mp4-ttml-dfxp
> @@ -1,9 +1,9 @@
> -2e7e01c821c111466e7a2844826b7f6d *tests/data/fate/mov-mp4-ttml-dfxp.mp4
> -8519 tests/data/fate/mov-mp4-ttml-dfxp.mp4
> +658884e1b789e75c454b25bdf71283c9 *tests/data/fate/mov-mp4-ttml-dfxp.mp4
> +8486 tests/data/fate/mov-mp4-ttml-dfxp.mp4
> #tb 0: 1/1000
> #media_type 0: data
> #codec_id 0: none
> -0, 0, 0, 68500, 7866, 0x456c36b7
> +0, 0, 0, 68500, 7833, 0x31b22193
> {
> "packets": [
> {
> @@ -15,7 +15,7 @@
> "dts_time": "0.000000",
> "duration": 68500,
> "duration_time": "68.500000",
> - "size": "7866",
> + "size": "7833",
> "pos": "44",
> "flags": "K_"
> }
> diff --git a/tests/ref/fate/mov-mp4-ttml-stpp b/tests/ref/fate/mov-mp4-ttml-
> stpp
> index 77bd23b7bf..f25b5b2d28 100644
> --- a/tests/ref/fate/mov-mp4-ttml-stpp
> +++ b/tests/ref/fate/mov-mp4-ttml-stpp
> @@ -1,9 +1,9 @@
> -cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-ttml-stpp.mp4
> -8547 tests/data/fate/mov-mp4-ttml-stpp.mp4
> +c9570de0ccebc858b0c662a7e449582c *tests/data/fate/mov-mp4-ttml-stpp.mp4
> +8514 tests/data/fate/mov-mp4-ttml-stpp.mp4
> #tb 0: 1/1000
> #media_type 0: data
> #codec_id 0: none
> -0, 0, 0, 68500, 7866, 0x456c36b7
> +0, 0, 0, 68500, 7833, 0x31b22193
> {
> "packets": [
> {
> @@ -15,7 +15,7 @@ cbd2c7ff864a663b0d893deac5a0caec *tests/data/fate/mov-mp4-
> ttml-stpp.mp4
> "dts_time": "0.000000",
> "duration": 68500,
> "duration_time": "68.500000",
> - "size": "7866",
> + "size": "7833",
> "pos": "44",
> "flags": "K_"
> }
> diff --git a/tests/ref/fate/sub-textenc b/tests/ref/fate/sub-textenc
> index 3ea56b38f0..910ca3d6e3 100644
> --- a/tests/ref/fate/sub-textenc
> +++ b/tests/ref/fate/sub-textenc
> @@ -160,18 +160,18 @@ but show this: {normal text}
> \ N is a forced line break
> \ h is a hard space
> Normal spaces at the start and at the end of the line are trimmed while hard
> spaces are not trimmed.
> -
> The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hha
> rd\hspace.\h:-D
> +The line will never break automatically right before or after a hard space.
> :-D
>
> 31
> 00:00:54,501 --> 00:00:56,500
>
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> + A (05 hard spaces followed by a letter)
> A (Normal spaces followed by a letter)
> A (No hard spaces followed by a letter)
>
> 32
> 00:00:56,501 --> 00:00:58,500
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> + A (05 hard spaces followed by a letter)
> A (Normal spaces followed by a letter)
> A (No hard spaces followed by a letter)
> Show this: \TEST and this: \-)
> @@ -179,10 +179,10 @@ Show this: \TEST and this: \-)
> 33
> 00:00:58,501 --> 00:01:00,500
>
> -A letter followed by 05 hard spaces: A\h\h\h\h\h
> +A letter followed by 05 hard spaces: A
> A letter followed by normal spaces: A
> A letter followed by no hard spaces: A
> -05 hard spaces between letters: A\h\h\h\h\hA
> +05 hard spaces between letters: A A
> 5 normal spaces between letters: A A
>
> ^--Forced line break
> diff --git a/tests/ref/fate/sub-ttmlenc b/tests/ref/fate/sub-ttmlenc
> index 4df8f8796f..aea09bb31e 100644
> --- a/tests/ref/fate/sub-ttmlenc
> +++ b/tests/ref/fate/sub-ttmlenc
> @@ -109,16 +109,16 @@
> end="00:00:54.500"><span region="Default">Hide these tags:<br/>also
> hide these tags:<br/>but show this: {normal text}</span></p>
> <p
> begin="00:00:54.501"
> - end="00:01:00.500"><span region="Default"><br/>\ N is a forced line
> break<br/>\ h is a hard space<br/>Normal spaces at the start and at the end
> of the line are trimmed while hard spaces are not
> trimmed.<br/>The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\
> hafter\ha\hhard\hspace.\h:-D</span></p>
> + end="00:01:00.500"><span region="Default"><br/>\ N is a forced line
> break<br/>\ h is a hard space<br/>Normal spaces at the start and at the end
> of the line are trimmed while hard spaces are not trimmed.<br/>The line will
> never break automatically right before or after a hard space. :-D</span></p>
> <p
> begin="00:00:54.501"
> - end="00:00:56.500"><span region="Default"><br/>\h\h\h\h\hA (05 hard
> spaces followed by a letter)<br/>A (Normal spaces followed by a
> letter)<br/>A (No hard spaces followed by a letter)</span></p>
> + end="00:00:56.500"><span region="Default"><br/> A (05 hard
> spaces followed by a letter)<br/>A (Normal spaces followed by a
> letter)<br/>A (No hard spaces followed by a letter)</span></p>
> <p
> begin="00:00:56.501"
> - end="00:00:58.500"><span region="Default">\h\h\h\h\hA (05 hard
> spaces followed by a letter)<br/>A (Normal spaces followed by a
> letter)<br/>A (No hard spaces followed by a letter)<br/>Show this: \TEST and
> this: \-)</span></p>
> + end="00:00:58.500"><span region="Default"> A (05 hard spaces
> followed by a letter)<br/>A (Normal spaces followed by a letter)<br/>A (No
> hard spaces followed by a letter)<br/>Show this: \TEST and this: \-
> )</span></p>
> <p
> begin="00:00:58.501"
> - end="00:01:00.500"><span region="Default"><br/>A letter followed by
> 05 hard spaces: A\h\h\h\h\h<br/>A letter followed by normal spaces: A<br/>A
> letter followed by no hard spaces: A<br/>05 hard spaces between letters:
> A\h\h\h\h\hA<br/>5 normal spaces between letters: A A<br/><br/>^--Forced
> line break</span></p>
> + end="00:01:00.500"><span region="Default"><br/>A letter followed by
> 05 hard spaces: A <br/>A letter followed by normal spaces: A<br/>A
> letter followed by no hard spaces: A<br/>05 hard spaces between letters: A
> A<br/>5 normal spaces between letters: A A<br/><br/>^--Forced line
> break</span></p>
> <p
> begin="00:01:00.501"
> end="00:01:02.500"><span region="Default">Both line should be
> strikethrough,<br/>yes.<br/>Correctly closed tags<br/>should be
> hidden.</span></p>
> diff --git a/tests/ref/fate/sub-webvttenc b/tests/ref/fate/sub-webvttenc
> index 45ae0b6131..f4172dcc84 100644
> --- a/tests/ref/fate/sub-webvttenc
> +++ b/tests/ref/fate/sub-webvttenc
> @@ -132,26 +132,26 @@ but show this: {normal text}
> \ N is a forced line break
> \ h is a hard space
> Normal spaces at the start and at the end of the line are trimmed while hard
> spaces are not trimmed.
> -
> The\hline\hwill\hnever\hbreak\hautomatically\hright\hbefore\hor\hafter\ha\hha
> rd\hspace.\h:-D
> +The line will never break automatically right before or after a hard space.
> :-D
>
> 00:54.501 --> 00:56.500
>
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> + A (05 hard spaces followed by a letter)
> A (Normal spaces followed by a letter)
> A (No hard spaces followed by a letter)
>
> 00:56.501 --> 00:58.500
> -\h\h\h\h\hA (05 hard spaces followed by a letter)
> + A (05 hard spaces followed by a letter)
> A (Normal spaces followed by a letter)
> A (No hard spaces followed by a letter)
> Show this: \TEST and this: \-)
>
> 00:58.501 --> 01:00.500
>
> -A letter followed by 05 hard spaces: A\h\h\h\h\h
> +A letter followed by 05 hard spaces: A
> A letter followed by normal spaces: A
> A letter followed by no hard spaces: A
> -05 hard spaces between letters: A\h\h\h\h\hA
> +05 hard spaces between letters: A A
> 5 normal spaces between letters: A A
>
> ^--Forced line break
> --
Patchwork fails to apply this patch due to trailing whitespace:
https://patchwork.ffmpeg.org/project/ffmpeg/patch/DM8P223MB036543CB351641BF7280F653BA709@DM8P223MB0365.NAMP223.PROD.OUTLOOK.COM/
The problem is, this time, the whitespace needs to be there.
Does anybody have an idea what could be done in this
case?
Thanks,
softworkz
More information about the ffmpeg-devel
mailing list