[FFmpeg-devel] [PATCH] lavc/movtextdec: fix incorrect offset calculation for UTF-8 characters
Erik BrĂ¥then Solem
erikbsolem at hotmail.com
Wed Mar 8 03:37:24 EET 2017
The 3GPP Timed Text (TTXT / tx3g / mov_text) specification counts multibyte UTF-8 characters as one single character, ffmpeg currently counts bytes. This patch inserts an if test such that:
1. continuation bytes are not counted during decoding
2. style boxes will not split these characters
Fixes trac #6021 (decoding part).
---
libavcodec/movtextdec.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/libavcodec/movtextdec.c b/libavcodec/movtextdec.c
index 6de1500..2c7a204 100644
--- a/libavcodec/movtextdec.c
+++ b/libavcodec/movtextdec.c
@@ -342,6 +342,7 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end,
}
while (text < text_end) {
+ if ((*text & 0xC0) != 0x80) { /* Boxes never split multibyte chars */
if (m->box_flags & STYL_BOX) {
for (i = 0; i < m->style_entries; i++) {
if (m->s[i]->style_flag && text_pos == m->s[i]->style_end) {
@@ -387,6 +388,8 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end,
}
}
}
+ text_pos++;
+ }
switch (*text) {
case '\r':
@@ -399,7 +402,6 @@ static int text_to_ass(AVBPrint *buf, const char *text, const char *text_end,
break;
}
text++;
- text_pos++;
}
return 0;
--
1.9.5 (Apple Git-50.3)
More information about the ffmpeg-devel
mailing list