[FFmpeg-devel] [PATCH 09/10] avfilter/vsrc_mandelbrot: use hypot()
Ganesh Ajjanagadde
gajjanag at mit.edu
Mon Nov 23 21:48:40 CET 2015
On Mon, Nov 23, 2015 at 2:13 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Nov 23, 2015 at 01:57:24PM -0500, Ganesh Ajjanagadde wrote:
>> On Mon, Nov 23, 2015 at 1:02 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Mon, Nov 23, 2015 at 12:43:52PM -0500, Ganesh Ajjanagadde wrote:
>> >> On Sun, Nov 22, 2015 at 3:56 PM, Ganesh Ajjanagadde <gajjanag at mit.edu> wrote:
>> >> > On Sun, Nov 22, 2015 at 3:07 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> >> On Sun, Nov 22, 2015 at 12:05:49PM -0500, Ganesh Ajjanagadde wrote:
>> >> >>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> >> >>> ---
>> >> >>> libavfilter/vsrc_mandelbrot.c | 2 +-
>> >> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> >> >>>
>> >> >>> diff --git a/libavfilter/vsrc_mandelbrot.c b/libavfilter/vsrc_mandelbrot.c
>> >> >>> index 950c5c8..a0c101e 100644
>> >> >>> --- a/libavfilter/vsrc_mandelbrot.c
>> >> >>> +++ b/libavfilter/vsrc_mandelbrot.c
>> >> >>> @@ -291,7 +291,7 @@ static void draw_mandelbrot(AVFilterContext *ctx, uint32_t *color, int linesize,
>> >> >>>
>> >> >>> use_zyklus= (x==0 || s->inner!=BLACK ||color[x-1 + y*linesize] == 0xFF000000);
>> >> >>> if(use_zyklus)
>> >> >>> - epsilon= scale*1*sqrt(SQR(x-s->w/2) + SQR(y-s->h/2))/s->w;
>> >> >>> + epsilon= scale*hypot(x-s->w/2, y-s->h/2)/s->w;
>> >> >>
>> >> >> old:
>> >> >> 704 decicycles in hypo, 1048570 runs, 6 skips
>> >> >>
>> >> >> new:
>> >> >> 1075 decicycles in hypo, 1048566 runs, 10 skips
>> >> >>
>> >> >> that is from START/STOP_TIMER over hypot()
>> >> >>
>> >> >> the code is speed relevant as its executed per pixel
>> >> >
>> >> > Thanks for testing. Looking more closely, I see no reason for
>> >> > expensive sqrt calls anyway: one can simply square both sides; it
>> >> > should be cheaper. Will rework, post benchmark if it is indeed faster
>> >> > and does not suffer from floating point overflow, else will simply
>> >> > push a trivial removal of the "1".
>> >>
>> >> It seems like getting rid of the sqrt altogether has a very slight
>> >> positive impact (if any at all). I can post the patch, but would like
>> >> to know what to benchmark. There are numerous choices, e.g
>> >> draw_mandelbrot as a whole, the outer loop, or the inner loop.
>> >> I personally think the inner x loop (lines 268-388) is a good place to
>> >> look at, since the difference is very small anyway, and further
>> >> localization is impossible.
>> >
>
>> > please post the patch
>>
>> bench posted first to see if it is considered interesting enough.
>> Bench over whole draw_mandelbrot using START/STOP timer on x86-64,
>> Haswell, GNU/Linux, command line:
>> ffmpeg -v error -f lavfi -i mandelbrot -f null -
>> new (draw_mandelbrot):
> [...]
>> 20857881401 decicycles in draw_mandelbrot, 1024 runs, 0 skips
>>
>> old (draw_mandelbrot):
> [...]
>> 21393227201 decicycles in draw_mandelbrot, 1024 runs, 0 skips
>
> if this is consistent over several tries then its interresting
There is a reason why I am posting a full vector, since it is very
hard to judge. I ran for a longer duration below. I do see a downward
trend, but unfortunately the magnitude of the effect is unclear.
Furthermore, there seem to be runtime variations in the actual numbers
compared to the previous run, though they ran on the same hardware. I
did not use any fancy tricks like core pinning etc, which could have
helped in ensuring minimal background task interference.
BTW, this filter is terribly slow as it zooms in, together with a
bunch of messages at the info level "Mandelbrot cache is too small!"
that do not seem very user friendly to me.
old (draw_mandelbrot):
2232680340 decicycles in draw_mandelbrot, 1 runs, 0 skips
1842048190 decicycles in draw_mandelbrot, 2 runs, 0 skips
1674804840 decicycles in draw_mandelbrot, 4 runs, 0 skips
1698806217 decicycles in draw_mandelbrot, 8 runs, 0 skips
1854012313 decicycles in draw_mandelbrot, 16 runs, 0 skips
2064778166 decicycles in draw_mandelbrot, 32 runs, 0 skips
2414843681 decicycles in draw_mandelbrot, 64 runs, 0 skips
3099993554 decicycles in draw_mandelbrot, 128 runs, 0 skips
3982389425 decicycles in draw_mandelbrot, 256 runs, 0 skips
7634221782 decicycles in draw_mandelbrot, 512 runs, 0 skips
20576449397 decicycles in draw_mandelbrot, 1024 runs, 0 skips
12949998655 decicycles in draw_mandelbrot, 2048 runs, 0 skips
new (draw_mandelbrot):
2177824300 decicycles in draw_mandelbrot, 1 runs, 0 skips
1766861190 decicycles in draw_mandelbrot, 2 runs, 0 skips
1586299055 decicycles in draw_mandelbrot, 4 runs, 0 skips
1658036837 decicycles in draw_mandelbrot, 8 runs, 0 skips
1836125036 decicycles in draw_mandelbrot, 16 runs, 0 skips
2058982311 decicycles in draw_mandelbrot, 32 runs, 0 skips
2423381281 decicycles in draw_mandelbrot, 64 runs, 0 skips
3066657833 decicycles in draw_mandelbrot, 128 runs, 0 skips
3966406060 decicycles in draw_mandelbrot, 256 runs, 0 skips
7553322112 decicycles in draw_mandelbrot, 512 runs, 0 skips
20454169970 decicycles in draw_mandelbrot, 1024 runs, 0 skips
12822228615 decicycles in draw_mandelbrot, 2048 runs, 0 skips
>
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> No snowflake in an avalanche ever feels responsible. -- Voltaire
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
More information about the ffmpeg-devel
mailing list