[MPlayer-users] unicode subtitles

Georgi Georgiev chutz at chubaka.homeip.net
Mon May 6 18:25:02 CEST 2002


On Mon, May 06, 2002 at 03:32:25PM +0200, Artur Zaprzala wrote:
> Georgi Georgiev wrote:
> >TOOLS/subfont-c/encodings/charmap2enc has a line saying if (c<"80"), and I 
> >don't understand what it is doing there. There are a lot of single-byte 
> >encodings that use almost all the 0x100 possible values of a byte. I for 
> >example couldn't create a koi8-r font because of that.
> 
> charmap2enc was a simple way to support EUC encodings and `if (c<"80")' 
> is there because mplayer with -unicode option uses similar condition to 
> distinguish mulitbyte sequences.

What about using charmap2enc when creating an encoding that is intended to be used with mplayer WITHOUT the -unicode option. As I stated in the mail before the last (just look at the double quoted text) I DID need to remove the "if (c<0x80)" clause when creating a koi8-r font. What about you guys who were doing the iso-8859-2 fonts or just about any encoding that is single-byte and has all the special symbols in the > 0x80 area.

> BTW, does anybody remember what was -unicode option introduced for? It 
> will not work with Unicode, but with e.g. EUC-KR will. Wasn't it meant 
> for EUC encodings?

FYI, EUC-JP is an encoding that has plenty of symbols encoded in three-bytes and those are not supported by mplayer yet. I guess "double-byte encoding" is a better suited term. And yes, it seems like what the -unicode option does is to simply introduce the "if (c>=0x80)" check, and when that happens to build a character from two bytes, but not only one. Definitely confusing (having the option called "unicode" is confusing I mean). I guess the guy who was patching mplayer to work with Korean subtitles introduced this option.

Back to the charmap2enc script. Its only problem seems to be not only the c<"80" check, but it also has problems with converting the big5 encoding file that I have on my system. I did change it a bit, and if you guys think the changes are good, you might as well commit them. The problem was that in the big5 encoding file (and maybe others as well) there are lines like:

%IRREVERSIBLE%<U255E>           /xf9/xe9        BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE

and in that case the substr($1.... thing in the charmap2enc script that comes with mplayer gives wrong output. I am attaching my charmap2enc to this mail.

MPlayer also has a -utf8 option. Is it any good at all? I mean, has anyone, ever, tried and actually used that option for a reason?

Actually, maybe it is better to submit a patch....

--- main.old/TOOLS/subfont-c/encodings/charmap2enc      Tue Aug 14 03:37:10 2001
+++ main/TOOLS/subfont-c/encodings/charmap2enc  Tue May  7 01:19:45 2002
@@ -2,10 +2,10 @@
 # only for mostly 2-byte encodings like euc-kr
 
 $2~"^/x..$" {
-       c = substr($2, 3, 2)
-       if (c<"80")
-           print substr($1, 3, 4) "\t" c
+       match($0,/\<U([[:xdigit:]]+)\>.*\/x([[:xdigit:]]+)/,d);
+       print d[1] "\t" d[2]
 }
 $2~"^/x../x..$" {
-       print substr($1, 3, 4) "\t" substr($2, 3, 2) substr($2, 7, 2)
+       match($0,/\<U([[:xdigit:]]+)\>.*\/x([[:xdigit:]]+)\/x([[:xdigit:]]+)/,a)
+       print a[1] "\t" a[2]a[3]
 }


-- 
Chutz <chutz at chubaka.homeip.net>
--------------------------------
Help me, I'm a prisoner in a Fortune cookie file!




More information about the MPlayer-users mailing list