[FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)
Daniel Cantarín
canta at canta.com.ar
Sat Dec 11 17:17:38 EET 2021
Hi there softworkz.
Having worked before with OCR filter output, I suggest you a
modification for your new filter.
It's not something that should delay the patch, but just a nice addenum.
Could be done in another patch, or could even do it myself in the
future. But I let the comment here anyways, for you to consider.
If you take a look at vf_ocr, you'll see that it sets
"lavfi.ocr.confidence" metadata field.
Well... downstream filters can check that field in order to just
consider certain confidence threshold, discarding the rest.
This is very useful when doing OCR with non-ascii chars, like I do with
Spanish language.
So I propose an option like this:
{ "confidence", "Sets the confidence threshold for valid OCR. Default
80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },
Then you do an average of all confidences detected by tesseract after
OCR but before converting to text subtitle frame, and compare that
option value to the average result.
Something like this:
int average = sum_of_all_confidences / number_of_confidence_items;
if (average >= s->confidence) {
do_your_thing();
} else {
av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold.
Text detected: '%s'\n", average, text);
}
Also, I would like to do some tests with spanish OCR, as I had to
explicitly allowlist our non-ascii chars when using OCR filter, and
don't know how yours will behave in that situation. Maybe having the
chars allowlist option here too is a good idea. But, again: none of this
this should delay the patch, as your work is much more important than
this kind of nice to have functionalities, which could be easily
implemented later by anyone.
Thanks,
Daniel.
More information about the ffmpeg-devel
mailing list