[FFmpeg-devel] [PATCH] delogo filter: new "uglarm" interpolation mode added
Uwe Freese
uwe.freese at gmx.de
Wed Jan 2 23:08:55 EET 2019
Hi,
just an info: code didn't compile, don't know why I didn't see this...
New patch is in work...
Regards, Uwe
Am 01.01.19 um 22:14 schrieb Uwe Freese:
> Hello,
>
> here's a new version of the patch.
>
> Thanks for the infos. I used the raw output of a small test video
> (where delogo is applied in both modes) before and after the changes
> to make sure the output is bytewise identical (the changes don't
> change the output).
>
>
> In general I want to say that it seems to me that *some* of the points
> go now in a more philosophical direction.
>
> I prefer code easy to understand and assume that compilers optimize
> much. Yes, there may be many possibilities to probably "optimize"
> something, but I tend to believe that many times this is not needed
> nor helpful, because it doesn't really optimize and complicates the code.
>
> Additionally, when I don't have complete and functioning code to
> replace my code (the same what's expected from me), I'm not willing to
> spend many hours to guess what could be meant.
>
> I hope that not so many additional patches are needed anymore. I
> understand that coding styles, syntax etc. shall be corrected, but
> hope that we don't have to discuss too much alternative ways of
> implementing basically the same.
>
>
>
> Changes since the last version of the patch (mail from 29.12. 21:38),
> according the comments since then:
>
> - change include order at the beginning of the file
> - change loop format (linebreaks)
> - change loop borders
> - change indentation (line 175)
> - moved init code (create tables) to config_input function
>
> - used av_malloc_array instead of av_malloc. Avoid the cast. Use
> variable in sizeof(...).
> - copyright 2018, 2019 - happy new year BTW ;-)
>
>
> Comments and remaining open points from the mails:
>
>>> + for (y = logo_y1 + 1; y < logo_y2; y++) {
>>> + for (x = logo_x1 + 1,
>>> + xdst = dst + logo_x1 + 1,
>>> + xsrc = src + logo_x1 + 1; x < logo_x2; x++, xdst++,
>>> xsrc++) {
>> I think a line break after the semicolons would make this easier to
>> understand.
>
> Hm. I used the same format as in the original code.
>
> Nonetheless, I changed the format now, because I changed the loop
> borders as requested and the loops are now different anyway.
>
>>> + for (bx = 0; bx < logo_w; bx++) {
>>> + interpd += topleft[bx] *
>>> + uglarmtable[abs(bx - (x - logo_x1)) + (y -
>>> logo_y1) * (logo_w - 1)];
>>> + interpd += botleft[bx] *
>>> + uglarmtable[abs(bx - (x - logo_x1)) +
>>> (logo_h - (y - logo_y1) - 1) * (logo_w - 1)];
>>> + }
>> This can be optimized, and since it is the inner loop of the filter, I
>> think it is worth it. You can declare a pointer that will stay the same
>> for the whole y loop:
>>
>> double *xweight = uglarmtable + (y - logo_y1) * (logo_w - 1);
>>
>> and then use it in that loop:
>>
>> interpd += ... * xweight[abs(bx - (x - logo_x1))];
>>
>> It avoids a lot of multiplications.
>
> I'm not sure about this point. First, I would absolutely assume from
> the compiler that it optimizes expressions like "(logo_w - 1)" or a
> fixed offset to access the array in the loop and that they are only
> calculated once.
>
> Secondly, I don't exactly understand how *xweight in your example
> should be used (and would mean smaller or easier code).
>
> @All: What is the opinion about changing this loop regarding use of an
> additional xweight pointer?
>
>>
>>> +
>>> + for (by = 1; by < logo_h - 1; by++) {
>>> + interpd += topleft[by * src_linesize] *
>>> + uglarmtable[(x - logo_x1) + abs(by - (y -
>>> logo_y1)) * (logo_w - 1)];
>>> + interpd += topleft[by * src_linesize + (logo_w
>>> - 1)] *
>>> + uglarmtable[logo_w - (x - logo_x1) - 1 +
>>> abs(by - (y - logo_y1)) * (logo_w - 1)];
>>> + }
>> This one is more tricky to optimize, because of the abs() within the
>> multiplication, but it can be done by splitting the loop in two:
>>
>> for (by = 1; by < y; by++) {
>> interpd += ... * yweight[x - logo_x1];
>> yweight -= logo_w_minus_one;
>> }
>> for (; by < logo_h_minus_one; by++) {
>> interpd += ... * yweight[x - logo_x1];
>> yweight += logo_w_minus_one;
>> }
>> av_assert2(yweight == the_correct_value);
>>
>> In fact, I think even the x loop would be a little more readable if it
>> was split in two like that.
> Sorry, I don't understand. Please give me a complete example that
> replaces the previous for loop.
>>> +
>>> + interp = (uint64_t)(interpd /
>>> + uglarmweightsum[(x - logo_x1) - 1 + (y -
>>> logo_y1 - 1) * (logo_w - 2)]);
>> The cast to uint64_t seems suspicious. Can you explain?
>
> Every pixel value of the inner logo area is an integer. Only for the
> calculation of the weighted average, all values use floats. At the
> end, it is rounded (truncated) to an int.
>
> Should work - and did work for 17 years...
>
>>
>>> + *xdst = interp;
>>> + }
>>> +
>>> + dst += dst_linesize;
>>> + src += src_linesize;
>>> + }
>>> + }
>>> +}
>>> +
>>> +/**
>>> + * Calculate the lookup tables to be used in UGLARM interpolation
>>> mode.
>>> + *
>>> + * @param *uglarmtable Pointer to table containing weights for
>>> each possible
>>> + * diagonal distance between a border
>>> pixel and an inner
>>> + * logo pixel.
>>> + * @param *uglarmweightsum Pointer to a table containing the
>>> weight sum to divide
>>> + * by for each pixel within the logo area.
>>> + * @param sar The sar to take into account when
>>> calculating lookup
>>> + * tables.
>>> + * @param logo_w width of the logo
>>> + * @param logo_h height of the logo
>>> + * @param exponent exponent used in uglarm interpolation
>>> + */
>>> +static void calc_uglarm_tables(double *uglarmtable, double
>>> *uglarmweightsum,
>>> + AVRational sar, int logo_w, int logo_h,
>>> + float exponent)
>>> +{
>>> + double aspect = (double)sar.num / sar.den;
>> Tiny optimization:
>>
>> double aspect2 = aspect * aspect;
>>
>> for use later.
>
> I wouldn't consider this an optimization. First, I assume the compiler
> to only calculate "aspect * aspect" once in that loop (as well as "y *
> y" which doesn't change in the inner loop). Second, the code with
> using the additional variable makes the code more complex and third,
> this loop is only calculated once at startup.
>
> @All: What are your opinions?
>
>>
>>> + double d = pow(sqrt(x * x * aspect * aspect + y *
>>> y), exponent);
>> pow(x * x * aspect2 + y * y, exponent / 2);
>
> Hm. Again, I'm unsure if this "optimization" is worth it. I wouldn't
> do this normally.
> Reasons: Again, I think the compiler would optimize it. At least the
> compiled ffmpeg binaries are exactly the same size... And the original
> code explains better the calculation: Calculate the distance
> (*Pythagorean theorem, c = sqrt(a² + b²), and then calculate the
> weighted distance value with the power. Again, this is only run once
> at startup
> *
>
> *@All: Other opinions?
> *
>
>>
>>> +#define MAX_PLANES 10
>> You could use FF_ARRAY_ELEMS(AVPixFmtDescriptor.comp), and the
>> consistency would be guaranteed. Well, that syntax is not valid, but it
>> can be expressed:
>>
>> #define MAX_PLANES FF_ARRAY_ELEMS(((AVPixFmtDescriptor *)0)->comp)
>>
>> But I have a better suggestion:
>>
>> #define MAX_SUB 2
>>
>> double uglarmtable[MAX_SUB + 1][MAX_SUB + 1]
>>
>> and use uglarmtable[hsub][vsub]. That way, for YUVA420P for example, the
>> table will be computed only once for Y and A and once for U and V.
>>
>> The assert is still needed with that solution, though.
>
> I don't understand this. Please provide a complete example, or it
> stays as it is. - and could of course be optimized later.
>
>>> + if (s->mode == MODE_UGLARM) {
>> No need to test, we can free anyway.
>
> But why should the for loop run in xy mode and check "planes count"
> times to free the memory? The code without the "if" also looks to me
> more like this is not checked by mistake.
>
> @All: Opinions?
>
>
>> [Carl Eugen] While there, please don't use sizeof(type) but
>> sizeof(variable),
> same below.
>
> Hope that the code is correct like this?
>
> s->uglarmtable[plane] =
> av_malloc_array((logo_w - 1) * (logo_h - 1),
> sizeof(s->uglarmtable[plane]));
>
> Regards,
> Uwe
>
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list