[MPlayer-users] Re: getting small subs from dvd

Moritz Bunkus moritz at bunkus.org
Mon Sep 1 12:52:42 CEST 2003


Hi.

The quality of the results depends on the version of gocr used. Another
thing that I do is use some sed script right after the conversion. This
gets rid of a LOT of misinterpretations, especially when i/1/l are mixed
up. Examples:

s/\<l\>/I/g
s/\<l'll\>/I'll/g
s/\<ln\>/In/g
...

Subtitleripper can automatically use such scripts if they're placed in
~/.subtitleripper. This one's from gocrfilter_en.sed

After this step I usually use ispell (I'll try aspell soon). Of course
this whole process is long and tiring, but the results are pretty good
in my experience.

Oh, and if gocr's recognition is really bad then you might want to play
around with its -s parameter. Sometimes characters are far apart,
sometimes they're close to each other. In pgm2txt you can find lines
like...

# GOCR options for pure data base mode 
GOCR_OPTIONS_DB_ONLY="-s 8 -d 0 -m 130 -m 4 -m 256 -m 32"

# GOCR options for with automatic char recognition
GOCR_OPTIONS_AUTO="-s 10 -m 130"

Adjust these. And make DAMN sure you've got the grey levels right!
Nothing's worse than outlined characters for OCR...

-- 
 ==> Ciao, Mosu (Moritz Bunkus)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/mplayer-users/attachments/20030901/9931b7ee/attachment.pgp>


More information about the MPlayer-users mailing list