[MPlayer-users] Problem with subrip in TOOLS

irisson jean-olivier jo.irisson at noos.fr
Fri Jul 2 03:25:49 CEST 2004


Hi all,

I've ripped some DVD subtitles to .idx and .sub files with mencoder and 
I want to convert them to .srt with the subrip utility provided in the 
"TOOLS" section of mplayer's code.
Previous attempts to use this utility were succesfull but now I notice a 
  problem. Indeed:
 > subrip subtitles 0 mysubs.srt
converts only around 65% of the file to .srt and then stops (227 
timestamps in the .idx file and only 157 subtitles in the .srt file). 
This problem appears with several .idx+.sub files: only the first 60% to 
70% of the file are converted, whatever their length. All these files 
handeled by a program under Windows (SubRip ;-) ) are converted just 
fine so the .idx and .sub files are correct.

Useful information for this problem is that:
- my subrip.c comes from the CVS snapshot MPlayer-20040630 and is 
modified at line 173:
     sprintf(cmd, GOCR_PROGRAM" -v 1 -s 7 -d -1 -i %s -o %s", buf, 
tmpfname);
instead of:
     sprintf(cmd, GOCR_PROGRAM" -v 1 -s 7 -d 0 -m 130 -m 256 -m 32 -i %s 
-o %s", buf, tmpfname);
This only causes gocr to perform the OCR fully automatically. I usually 
spend more time recognising each OCR failure that correcting its 
mistakes afterwards, that's why I'm doing this. Furthermore, this worked 
perfectly before with this modification.
It is compiled with gcc-3.4 or gcc-3.3.2 (exact same output and error) 
according to the command line given at the begining of the file. Here is 
the output of the compilation:
[jiho at laptop TOOLS]$ gcc -g -Wall -I.. -o subrip subrip.c ../vobsub.o 
../spudec.o ../mp_msg.o ../unrarlib.o ../postproc/swscale.o 
../postproc/rgb2rgb.o ../postproc/yuv2rgb.o ../libmpcodecs/img_format.o -lm

subrip.c: In function `fast_memcpy':
subrip.c:191: warning: implicit declaration of function `memcpy'
subrip.c: In function `main':
subrip.c:220: warning: passing arg 4 of `vobsub_open' from incompatible 
pointer type
/home/jiho/tmp/cciriTc5.o(.text+0x48e): In function `draw_alpha':
/home/jiho/files/progs/mplayer/MPlayer-20040630/TOOLS/subrip.c:167: 
warning: the use of `tmpnam' is dangerous, better use `mkstemp'

- my gocr is compiled from current CVS, with gcc-3.4. But the problem is 
the same with gocr-0.38 or gocr-0.37 installed from rpms.
What I cannot explain is that this little utility worked perfectly 
before and that my version of gocr whould have been something like 0.37 
or 0.38 at that time.

So I do not have any clue concerning this strange problem. Does that 
seem familiar to somebody? Is there a problem with the way subrip 
interfaces with gocr? Any help would be great. Thank you in advance.

In addition, here is the end of the output of subrip, if that helps:

[jiho at laptop encode]$ subrip outia_dvd1 0 subs.srt
(... lots of lines skipped ...)
# Optical Character Recognition --- gocr 0.39
# options are: -l 0 -s 7 -v 1 -c _ -m 0 -d -1 -n 0 
subtitle-1648480-1650311.pg
# using unicode
# db_path= (null)
# OTSU: thresholdValue = 129 gmin=81 gmax=255
# scanning boxes 15
# auto dust size = 2 (mX=9,mY=12)
# searching dust of size  1 ...   0 cluster detected
#   0 white pixels removed, cs=160
# smooth big chars 7x16 cs=160 ...   9 changes in 2 of 15
# detect barcode , 0 bars, boxes-0=15
# detect pictures, frames, noAlphas, mXmY= 9 12 ...  0 - boxes 15
# averages: mXmY= 11 13 nC= 13 n= 15
# remove boxes on border 0, within pictures . 0, 0 cluster detected, 
boxes 13
# rotation angle (x,y,num) (16384,-204,10) (0,0,0), pass 1
# rotation angle (x,y,num) (16384,-204,10) (46876,-568,9), pass 2
# detect longest line - at y=0 crosses=  0 my=0 - at crosses=  0 dy=0
# scanning lines  - lines= 1
# add line infos to boxes ... done
# divide vertical glued boxes, numC 13
# searching melted serifs ...   0 cluster corrected, 0 new boxes
# glue broken chars ...   2 times glued, remaining boxes 11
# detect dust2, ...    0 +   0 boxes deleted, numC= 11
# check for word pitch ... min=4 max=16 pitch_p=9
#  ...  min=11 max=24 v=0.222222 mono=0 pitch_m=15
# step 1: char recognition, 0 of 13 chars unidentified
# step 2: try to compare unknown with known chars - found 0
# step 3: try to divide unknown chars, numC 11
# insert space between words (dy=23) ... found 2
# step 4: context correction Il1 0O
# store boxtree to lines ...get_least_line_indent: page_width 239, dy 0
Line 1,  y 2, raw indent 17, adjusted indent 17
Minimum indent is 17
... 2 lines
Elapsed time: 0:00:8.421.
# Optical Character Recognition --- gocr 0.39
# options are: -l 0 -s 7 -v 1 -c _ -m 0 -d -1 -n 0 
subtitle-1650919-1654753.pg
# using unicode
# db_path= (null)
# OTSU: thresholdValue = 129 gmin=81 gmax=255
# scanning boxes 54
# auto dust size = 2 (mX=9,mY=12)
# searching dust of size  1 ...   0 cluster detected
#   0 white pixels removed, cs=160
# smooth big chars 7x16 cs=160 ...  53 changes in 12 of 54
# detect barcode , 0 bars, boxes-0=54
# detect pictures, frames, noAlphas, mXmY= 9 12 ...  0 - boxes 54
# averages: mXmY= 9 13 nC= 49 n= 54
# remove boxes on border 0, within pictures . 0, 0 cluster detected, 
boxes 49
# rotation angle (x,y,num) (18371,-301,34) (0,0,0), pass 1
# rotation angle (x,y,num) (18371,-301,34) (52585,-240,34), pass 2
# detect longest line - at y=60 crosses= 27 my=13 - at crosses= 25 dy=0
# scanning lines  - lines= 2
# add line infos to boxes ... done
# divide vertical glued boxes, numC 49
# searching melted serifs ...   0 cluster corrected, 0 new boxes
# glue broken chars ...   6 times glued, remaining boxes 43
# detect dust2, ...    0 +   0 boxes deleted, numC= 43
# check for word pitch ... min=0 max=13 pitch_p=7
#  ...  min=7 max=28 v=0.320000 mono=0 pitch_m=13
# step 1: char recognition, 0 of 48 chars unidentified
# step 2: try to compare unknown with known chars - found 0
# step 3: try to divide unknown chars, numC 43
# insert space between words (dy=23) ... found 16
# step 4: context correction Il1 0O
# store boxtree to lines ...get_least_line_indent: page_width 475, dy 0
Line 1,  y 3, raw indent 79, adjusted indent 79
Line 2,  y 44, raw indent 17, adjusted indent 17
Minimum indent is 17
... 2 lines
Elapsed time: 0:00:51.861.

Thanx again.

JiHO





More information about the MPlayer-users mailing list