[FFmpeg-devel] [PATCH v23 19/21] avfilter/graphicsub2text: Add new graphicsub2text filter (OCR)

Sat Dec 11 17:17:38 EET 2021

Hi there softworkz.

Having worked before with OCR filter output, I suggest you a 
modification for your new filter.
It's not something that should delay the patch, but just a nice addenum. 
Could be done in another patch, or could even do it myself in the 
future. But I let the comment here anyways, for you to consider.

If you take a look at vf_ocr, you'll see that it sets 
"lavfi.ocr.confidence" metadata field.
Well... downstream filters can check that field in order to just 
consider certain confidence threshold, discarding the rest.
This is very useful when doing OCR with non-ascii chars, like I do with 
Spanish language.

So I propose an option like this:

   { "confidence", "Sets the confidence threshold for valid OCR. Default 
80." , OFFSET(confidence), AV_OPT_TYPE_INT, {.i64=80}, 0, 100, FLAGS },

Then you do an average of all confidences detected by tesseract after 
OCR but before converting to text subtitle frame, and compare that 
option value to the average result.
Something like this:

   int average = sum_of_all_confidences / number_of_confidence_items;
   if (average >= s->confidence) {
     do_your_thing();
   } else {
     av_log(ctx, AV_LOG_DEBUG, "Confidence average %d under threshold. 
Text detected: '%s'\n", average, text);
   }

Also, I would like to do some tests with spanish OCR, as I had to 
explicitly allowlist our non-ascii chars when using OCR filter, and 
don't know how yours will behave in that situation. Maybe having the 
chars allowlist option here too is a good idea. But, again: none of this 
this should delay the patch, as your work is much more important than 
this kind of nice to have functionalities, which could be easily 
implemented later by anyone.

Thanks,
Daniel.