[Ffmpeg-devel] Character encoding in libavformat header
Juho Vähä-Herttua
juho.vaha-herttua
Fri Apr 28 14:20:32 CEST 2006
Hi,
I'm trying to use ffmpeg for ASF demuxing and WMA decoding mostly in
XMMS2 project's official asf/wma plugin. I already made a forked
version of it where I mainly ripped off just the ASF and WMA decoding
parts and supporting functions and it worked fine but it was ugly as
hell. Now most Linux and *BSD distributions seem to distribute
statically linkable versions of ffmpeg libraries so I thought why not
use one of those since that would make all the security patches and
stuff other people's responsibility. However one major obstacle came
into my way...
ASF demuxer stores all the header information (title, author,
description) in ISO-8859-1 charset even though ASF file format
natively uses UCS-2 (UTF-16, but although I don't know I suspect it
doesn't support surrogates) charset. The get_str16_nolen function in
asf.c goes as follows:
static void get_str16_nolen(ByteIOContext *pb, int len, char *buf,
int buf_size)
{
int c;
char *q;
q = buf;
while (len > 0) {
c = get_le16(pb);
if ((q - buf) < buf_size - 1)
*q++ = c;
len-=2;
}
*q = '\0';
}
As you can see it simply ignores every second byte of the field. This
doesn't even necessarily create any recognizable ISO-8859-1 text if
the header has >255 characters stored. So it should at least do some
check like: *q++ = (c > 255) ? '?' : c; to make sure that all unknown
characters are shown as ? characters instead of garbage.
What would be even better would be to re-encode it into UTF-8 which
is trivial to say at least, or alternatively have some way to access
the original raw header data. The advantage of UTF-8 would of course
be that it can be handled the same way as ISO-8859-1 string.
Disadvantage is that characters [128, 255] wouldn't show correctly in
ISO-8859-1 strings. Has ffmpeg made some decision about internal
metadata character encoding?
Our goal is to support metadata and charsets as well as possible so
this is really an important issue. I'd very much like to hear some
comments about the issue.
Juho V?h?-Herttua
P.S. Please keep me in the cc while replying since I'm not on this
mailing list.
More information about the ffmpeg-devel
mailing list